Reference Image processing.

Hi, @kousw thanks for sharing your implementation.

I notice that ConsiStory [1] paper mention that Edit Friendly DDPM Inversion [2] is used for reference image processing. But in this repo, it seems that the reference image is processed by vae to obtain latents here. I wonder is this correct? Looking forward for your reply.