Hi, @kousw thanks for sharing your implementation.
I notice that ConsiStory [1] paper mention that Edit Friendly DDPM Inversion [2] is used for reference image processing. But in this repo, it seems that the reference image is processed by vae to obtain latents here. I wonder is this correct? Looking forward for your reply.
Hi, @kousw thanks for sharing your implementation.
I notice that ConsiStory [1] paper mention that Edit Friendly DDPM Inversion [2] is used for reference image processing. But in this repo, it seems that the reference image is processed by vae to obtain latents here. I wonder is this correct? Looking forward for your reply.
[1] Training-Free Consistent Text-to-Image Generation [2] An Edit Friendly DDPM Noise Space: Inversion and Manipulations