Open pkuliyi2015 opened 1 year ago
Hi, maybe you can try $\mathbf{x}_{0|t}=\mathbf{x}_{0|t} + \lambda\mathbf{A}^{\dagger}(\mathbf{y} - \mathbf{A}\mathbf{x}_{0|t})$, where $\lambda$ slows the RND modification (when $\lambda=1$ it becomes the original RND).
The core problem for applying DDNM on latent diffusion is that the Encoder and Decoder are lossy, for example, the left is $\mathbf{A}^{\dagger}\mathbf{y}$, the right is D(E($\mathbf{A}^{\dagger}\mathbf{y}$)), this difference are not acceptable for RND:
The encoder and decoder are sensitive to block artifacts, so it is better to use a smoother downsampler and upsampler, say, the bicubic. But may still yield blur results.
Great thanks for your explanation. I found that by setting lambda = 0.1 and use steps = 100, the results from initial steps (<60) seem to be better and appealing, but finally it converges to the aforementioned image without any change.
I hope this can give you some insights. I'm looking into a practical solution of super-resolution with latent diffusion models, and feel DDNM is rather close to the final answer. I will present my pictures here for your reference.
Update: It still doesn't work if I stop using the DDNM objective after 60 steps. It seems that the DDNM objective is not suitable to latent diffusions regardless of engineering tricks:
Great thanks for your explanation. I found that by setting lambda = 0.1 and use steps = 100, the results from initial steps (<60) seem to be better and appealing, but finally it converges to the aforementioned image without any change.
I hope this can give you some insights. I'm looking into a practical solution of super-resolution with latent diffusion models, and feel DDNM is rather close to the final answer. I will present my pictures here for your reference.
Update: It still doesn't work if I stop using the DDNM objective after 60 steps. It seems that the DDNM objective is not suitable to latent diffusions regardless of engineering tricks:
Hello, may I ask which model you used when reproducing this method on Latent Diffusion? Was it a conditional model (trained with text embeddings as conditions like the SD1.5 model) or an unconditional one?
Hi authors, thanks for your great work.
I'm trying to apply the equations to latent diffusion models but it doesn't work (as you explained in another issue). Would you mind providing some insights on how to make it work? If this can be achieved without gradients?
Here is my test result: