Closed bonlime closed 4 months ago
Hi @inbarhub First of all thanks for a very good and interesting paper, really enjoyed reading.
I wonder if it's possibly to apply the derived noise maps to schedulers other than DDPM/DDIM? For example have you tried substituting the noise maps in
Euler Ancestral
sampler? Since ddim/ddpm in general seems to produce lower quality results/requires larger number of steps
Hi @bonlime, Have you tried that? It seems that null-text inversion uses DDIM with predictoin = 'epsilon'. I am not sure if null-text inversion suppprts other options. I am curious as well
@daiqing-qi No, I ended up not trying this paper. You're right that null-text uses DDIM inversion with prediction "epsilon", but I haven't yet seen the correct version of DDIM inversion. There is on in 🤗 diffusers, but it's incorrect and no one cares about it. I tried reimplementing it myself, it's more correct, but the math is still slightly off.
This is a proof that DDIM inversion in 🤗 is incorrect. With 0.5 strength going to noise from image (1) and back results in image (2) which looses saturation. My implementation results in image (3), which looks better at first
(1)
(2)
(3)
But after applying the inversion on the same image for multiple times 🤗 results in image (4) and my version in image (5) which shows that it's also incorrect but in different direction. The DPM++ inverted scheduler in 🤗 suffers from the same exact problems (4) (5)
@daiqing-qi No, I ended up not trying this paper. You're right that null-text uses DDIM inversion with prediction "epsilon", but I haven't yet seen the correct version of DDIM inversion. There is on in 🤗 diffusers, but it's incorrect and no one cares about it. I tried reimplementing it myself, it's more correct, but the math is still slightly off.
This is a proof that DDIM inversion in 🤗 is incorrect. With 0.5 strength going to noise from image (1) and back results in image (2) which looses saturation. My implementation results in image (3), which looks better at first (1) (2) (3)
But after applying the inversion on the same image for multiple times 🤗 results in image (4) and my version in image (5) which shows that it's also incorrect but in different direction. The DPM++ inverted scheduler in 🤗 suffers from the same exact problems (4) (5)
Hi @bonlime, thanks for you reply! It is very helpful. I think applying the inversion on the same image for multiple times leads to another image is reasonable as the inversion only makes the reconstructed image look similar, while the invisible erroes can accumulate. May I ask if you could share your code/implementation of the DDIM Inversion? Thanks!
Sure, the code is not much of a secret :)
It also requires sampling using flipped timesteps
self.inverse_scheduler.set_timesteps(num_inference_steps, device=device)
timesteps, num_inference_steps_ = self.get_timesteps(num_inference_steps, strength, device)
for t in self.inverse_scheduler.timesteps.flip(0)[:num_inference_steps_]:
I think applying the inversion on the same image for multiple times leads to another image is reasonable
The problem is not that we get another image, but rather that saturation changes. This effect is very consistent in both 🤗 and my implementation. Saturation always decreases for them and always increases for my version. Maybe carefully solving the inversion math could work, but I didn't have time to do that, maybe you would :)
Hi,
From the equation of "Euler Ancestral", it seems that you can extract the noise in the same way we did in DDPM. However, we haven't tried this.
Hi @inbarhub First of all thanks for a very good and interesting paper, really enjoyed reading.
I wonder if it's possibly to apply the derived noise maps to schedulers other than DDPM/DDIM? For example have you tried substituting the noise maps in
Euler Ancestral
sampler? Since ddim/ddpm in general seems to produce lower quality results/requires larger number of steps