mihirp1998 / AlignProp

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
https://align-prop.github.io/
MIT License
242 stars 8 forks source link

noise after certain number of epochs #6

Closed sachinnitw1317 closed 3 months ago

sachinnitw1317 commented 1 year ago

Hi,

I did some experiments to reproduce your results, but the model seems to lose all context after a certain number of epochs.

I am attaching the report here https://wandb.ai/sachin931350/align-prop/runs/ngkluhfs/overview

Please let me know what am i doing wrong

mihirp1998 commented 1 year ago

I saw your config, i normally use:

total_samples_per_epoch=256 total_batch_size= 128

I think you are using much lower numbers for these, can you try with setting the above numbers?

mihirp1998 commented 1 year ago

If you have made any other changes in the config then let me know

sachinnitw1317 commented 1 year ago

other configs are the same. I reduced this to run on a T4 machine

Let me try with total_samples_per_epoch=256 total_batch_size= 128

mihirp1998 commented 1 year ago

Probably reducing batch size might work, but i think you should also try reducing the learning rate with it.

I think this issue might be happening due to high lr

sachinnitw1317 commented 1 year ago

I have started another run with a batch size of 128 as you suggested, all the other settings are same except capacity per GPU

Will know the result in a couple of hours