mihirp1998 / AlignProp

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
https://align-prop.github.io/
MIT License
243 stars 8 forks source link

DRaFT paper #12

Closed yinanyz closed 4 months ago

yinanyz commented 1 year ago

It looks like DRaFT (DIRECTLY FINE-TUNING DIFFUSION MODELS ON DIFFERENTIABLE REWARDS) has a similar idea, and I'm wondering what're the main differences between your approaches. Thanks!

mihirp1998 commented 1 year ago

Yes this is indeed a concurrent work.

In terms of code the difference should be minor.

i) We do randomized truncated backprop however they do just truncated backprop. Although our code has the option to do truncated backprop, you can simply set randomized flag to be False.

ii) They sample many noise samples, however in our code we sample a single noise sample.