rinongal / textual_inversion

MIT License
2.87k stars 278 forks source link

Question about the parameterization "eps" and "x0" #135

Closed lizc126 closed 1 year ago

lizc126 commented 1 year ago

Hi,

Thank you for brilliant work and code repo. I have a question regarding the parameterization. I see the default is set to "eps", where the UNet model predicts noise and calculate loss wrt the original randomly generated noise. And if it is changed to "x0", UNet model will predict the image (in latent space) and calculate loss wrt the ground truth image (in latent space). Am I understanding this correctly? But if I set it to "x0", the sample generated seemed to be very off. Hope you can give me some ideas.

Thank you!

rinongal commented 1 year ago

Hi,

Sorry for the extremely late reply. The configuration you're referencing to is part of the original LDM code, and if you want to change it you need to actually re-train a model that uses the new parametrization.

Basically, what you're trying to do now is take a network that was trained to predict noise and treat its output as if it was the x0 prediction itself. This of course creates issues.

lizc126 commented 1 year ago

Hi,

Sorry for the extremely late reply. The configuration you're referencing to is part of the original LDM code, and if you want to change it you need to actually re-train a model that uses the new parametrization.

Basically, what you're trying to do now is take a network that was trained to predict noise and treat its output as if it was the x0 prediction itself. This of course creates issues.

Thank you for your reply!