lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
8k stars 757 forks source link

text_embeds in trainer.sample() does not work properly #325

Open bekhzod-olimov opened 1 year ago

bekhzod-olimov commented 1 year ago

I have trained an Imagen with one unet by changing dataset get_item function, where it returns a transformed image with the text_embedding (the embedding is obtained from t5_encode_text using t5-v1_1-base). During training, I get samples from the trained model using trainer.sample(text_embeds=text_embeds). I get text_embeds from the abovementioned t5_encode_text using t5-v1_1-base as follows:

text = "123나0456"
text_embeds = t5_encode_text(text)
images = trainer.sample(batch_size = 1, text_embeds=text_embeds, return_pil_images = True)

Although generator properly generates images of the license plate, it does not take into account the text (in this case, 123나0456) and generates random digits in the license plate. Why the text_embeds does not work properly?

sample_as-310

pjspol commented 1 year ago

I seem to be having a similar issue. Any help or updates would be greatly appreciated!!