mihirp1998 / AlignProp

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
https://align-prop.github.io/
MIT License
242 stars 8 forks source link

selecting the time step for gradient truncation #15

Closed daewon88 closed 3 months ago

daewon88 commented 6 months ago

Hi! Thank you for sharing your valuable work.

In the code, selecting the timestep for gradient truncation occurs every denoising step (lines 483-485). However, if the intention is to sample the truncation timestep from U(0,50), there may be some issues with this approach. Therefore, I suggest that selecting the timestep for gradient truncation should happen before each sampling, rather than each denoising step. Do you have any particular reason for this choice?

Thank you :)

if config.truncated_backprop:
    if config.truncated_backprop_rand:
        timestep = random.randint(config.truncated_backprop_minmax[0],config.truncated_backprop_minmax[1])
        if i < timestep:
            noise_pred_uncond = noise_pred_uncond.detach()
            noise_pred_cond = noise_pred_cond.detach()
ajaysub110 commented 3 months ago

I had the same question while going through the code. @mihirp1998 do you have a particular reason for choosing this approach over what @daewon88 suggests? Thanks!

mihirp1998 commented 3 months ago

Thanks for catching this bug!

The current code is not using U(0,50), but it is instead a gaussian distribution that is centered at 42, i haven't ablated this with U(0,50), but once i do i'll add this as an option in the code.

Figure_1