Closed daewon88 closed 3 months ago
I had the same question while going through the code. @mihirp1998 do you have a particular reason for choosing this approach over what @daewon88 suggests? Thanks!
Thanks for catching this bug!
The current code is not using U(0,50), but it is instead a gaussian distribution that is centered at 42, i haven't ablated this with U(0,50), but once i do i'll add this as an option in the code.
Hi! Thank you for sharing your valuable work.
In the code, selecting the timestep for gradient truncation occurs every denoising step (lines 483-485). However, if the intention is to sample the truncation timestep from U(0,50), there may be some issues with this approach. Therefore, I suggest that selecting the timestep for gradient truncation should happen before each sampling, rather than each denoising step. Do you have any particular reason for this choice?
Thank you :)