Excellent work! I wonder if it's possible to utilize certain noise-adding technique (e.g. DDPM q-sample) rather than DDIM-inversion for reference-based style-aligned generation. In my implementation, I gradually add noise to the reference image based on DDPM schedule to generate the reference latent sequence and then conduct style-aligned generation using your method. Some of the results are shown below:
I use Stable Diffusion 2.1 for generation and the prompt is "a toy car". It seems that this alternative method produces similar results as your original implementation, but the results are sometimes unstable. I think that it works because the pretrained DM is capable of predicting noise from the noised input. I want to know if it's reasonable to replace DDIM-inversion with DDPM. Why or why not?
Excellent work! I wonder if it's possible to utilize certain noise-adding technique (e.g. DDPM q-sample) rather than DDIM-inversion for reference-based style-aligned generation. In my implementation, I gradually add noise to the reference image based on DDPM schedule to generate the reference latent sequence and then conduct style-aligned generation using your method. Some of the results are shown below:
I use Stable Diffusion 2.1 for generation and the prompt is "a toy car". It seems that this alternative method produces similar results as your original implementation, but the results are sometimes unstable. I think that it works because the pretrained DM is capable of predicting noise from the noised input. I want to know if it's reasonable to replace DDIM-inversion with DDPM. Why or why not?