Consulting on latent diffusion models

wyhuai / DDNM

[ICLR 2023 Oral] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

MIT License

1.16k stars 83 forks source link

Consulting on latent diffusion models #23

Open pkuliyi2015 opened 1 year ago

pkuliyi2015 commented 1 year ago

Hi authors, thanks for your great work.

I'm trying to apply the equations to latent diffusion models but it doesn't work (as you explained in another issue). Would you mind providing some insights on how to make it work? If this can be achieved without gradients?

Here is my test result:

wyhuai commented 1 year ago

Hi, maybe you can try $\mathbf{x}_{0|t}=\mathbf{x}_{0|t} + \lambda\mathbf{A}^{\dagger}(\mathbf{y} - \mathbf{A}\mathbf{x}_{0|t})$, where $\lambda$ slows the RND modification (when $\lambda=1$ it becomes the original RND).

wyhuai commented 1 year ago

The core problem for applying DDNM on latent diffusion is that the Encoder and Decoder are lossy, for example, the left is $\mathbf{A}^{\dagger}\mathbf{y}$, the right is D(E($\mathbf{A}^{\dagger}\mathbf{y}$)), this difference are not acceptable for RND: 1678772350786

The encoder and decoder are sensitive to block artifacts, so it is better to use a smoother downsampler and upsampler, say, the bicubic. But may still yield blur results.

pkuliyi2015 commented 1 year ago

Great thanks for your explanation. I found that by setting lambda = 0.1 and use steps = 100, the results from initial steps (<60) seem to be better and appealing, but finally it converges to the aforementioned image without any change.

I hope this can give you some insights. I'm looking into a practical solution of super-resolution with latent diffusion models, and feel DDNM is rather close to the final answer. I will present my pictures here for your reference.

Update: It still doesn't work if I stop using the DDNM objective after 60 steps. It seems that the DDNM objective is not suitable to latent diffusions regardless of engineering tricks:

tobuta commented 3 months ago

Great thanks for your explanation. I found that by setting lambda = 0.1 and use steps = 100, the results from initial steps (<60) seem to be better and appealing, but finally it converges to the aforementioned image without any change.

I hope this can give you some insights. I'm looking into a practical solution of super-resolution with latent diffusion models, and feel DDNM is rather close to the final answer. I will present my pictures here for your reference.

Update: It still doesn't work if I stop using the DDNM objective after 60 steps. It seems that the DDNM objective is not suitable to latent diffusions regardless of engineering tricks:

Hello, may I ask which model you used when reproducing this method on Latent Diffusion? Was it a conditional model (trained with text embeddings as conditions like the SD1.5 model) or an unconditional one?