Have you tried trained on predicting 'epsilon' but not 'xstart'?

zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)

Other

944 stars 50 forks source link

Have you tried trained on predicting 'epsilon' but not 'xstart'? #24

Closed WikiChao closed 1 year ago

WikiChao commented 1 year ago

Very awesome work and inspired me a lot!!

I have a question regarding the experiment on training objectives. Have you tried training on reconstructing 'epsilon'? To me, it's not very intuitive why the model needs to output the same 'x_0' at different time steps.

I would appreciate it if you have further insights!

zsyOAOA commented 1 year ago

Predicting "x_0" provides better performance than predicting "episilon" in our model. If interested, you can have a try by yourself.

WikiChao commented 1 year ago

Thank you so much!