Open anonymous-atom opened 2 weeks ago
Can u check whether the training is actually getting run ? if not why is it skipping the training loop
It's working but I think it's only using few examples from the prompt file.
i think random.choice is uniform over all prompts, i'm not sure what's the bug here. If you find it let me know.
Yeah Sure!
Also while I was tryin to train with a custom loss function, the models seem to collapse very early unless I adjust the learning rate. is this the expected behaviour ?
@mihirp1998 Sorry to tag you again, but can you let me know how much time it took per epoch on your 4 A100 GPU's ?
Clear Skies!
On Aesthetics 2-3 minutes per epoch
On Sat, Nov 9, 2024 at 12:07 AM Karun @.***> wrote:
@mihirp1998 https://github.com/mihirp1998 Sorry to tag you again, but can you let me know how much time it took per epoch on your 4 A100 GPU's ?
Clear Skies!
— Reply to this email directly, view it on GitHub https://github.com/mihirp1998/AlignProp/issues/17#issuecomment-2466052573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4C5LG33CWOFGYKS3SESDTZ7WJ7TAVCNFSM6AAAAABROVLMA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGA2TENJXGM . You are receiving this because you were mentioned.Message ID: @.***>
I wanted to confirm are you using all 750 prompts in hps_v2_all.txt file for 1 single epoch ?
Hi @mihirp1998 , so this is where I am confused:
So you used step() function to do 1 training step/ 1 epoch ?
def train(self, epochs: Optional[int] = None):
"""
Train the model for a given number of epochs
"""
global_step = 0
if epochs is None:
epochs = self.config.num_epochs
for epoch in range(self.first_epoch, epochs):
global_step = self.step(epoch, global_step)
And here in step() function, it only seems to finetine on _num_gpus batch_size train_gradient_accumulationsteps number of images, am I missing something ? What if someone used just 1 GPU to train ?
def step(self, epoch: int, global_step: int):
info = defaultdict(list)
print(f"Epoch: {epoch}, Global Step: {global_step}")
self.sd_pipeline.unet.train()
for _ in range(self.config.train_gradient_accumulation_steps):
with self.accelerator.accumulate(self.sd_pipeline.unet), self.autocast(), torch.enable_grad():
prompt_image_pairs = self._generate_samples(
batch_size=self.config.train_batch_size,
)
following this thread. this is critical setup otherwise other folks cannot reproduce this
Hi @mihirp1998 , so this is where I am confused:
So you used step() function to do 1 training step/ 1 epoch ?
def train(self, epochs: Optional[int] = None): """ Train the model for a given number of epochs """ global_step = 0 if epochs is None: epochs = self.config.num_epochs for epoch in range(self.first_epoch, epochs): global_step = self.step(epoch, global_step)
And here in step() function, it only seems to finetine on _num_gpus batch_size train_gradient_accumulationsteps number of images, am I missing something ? What if someone used just 1 GPU to train ?
def step(self, epoch: int, global_step: int): info = defaultdict(list) print(f"Epoch: {epoch}, Global Step: {global_step}") self.sd_pipeline.unet.train() for _ in range(self.config.train_gradient_accumulation_steps): with self.accelerator.accumulate(self.sd_pipeline.unet), self.autocast(), torch.enable_grad(): prompt_image_pairs = self._generate_samples( batch_size=self.config.train_batch_size, )
Yes in this codebase step and epoch are equivalent, it's difficult to define an epoch as there is no dataset i'm training on.
If someone used one gpu for training and wants to maintain the batchsize i'm using, they should increase the accum_steps as i mention here:
https://github.com/mihirp1998/AlignProp/blob/5e950b3f16ded622df15f4bea2eec93f88962f2b/hps.sh#L1
@mihirp1998 I was tryin to finetune Stable Diffusion 1.5 using your HPS reward function and the hps.sh training script, I used batch size of 1 but still the training seems to get completed very quickly, 50 epochs just took 2-4 minutes.
And here you are trying to just use batch_size number of prompts ? I am using batch_size of 2 on 1 A100 GPU to test the script.
Your help will mean a lot!