lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
8.09k stars 768 forks source link

Working with images and text embeddings of different shape #336

Open AhmedGamal411 opened 1 year ago

AhmedGamal411 commented 1 year ago

I was wondering if I can use latents from a VAE as an input to Imagen UNet, just like latent diffusion models. But the issue is that VAE change the shape of the image (e.g. 1 Dimentional array or 4 channel images). What do I have to change to be able to do that?

I was also wondering about changing the text embeddings used. The issue is also that they might have different shapes. Is it feasible to use other embeddings with minimal code change?

Thank You!