I have trained an Imagen with one unet by changing dataset get_item function, where it returns a transformed image with the text_embedding (the embedding is obtained from t5_encode_text using t5-v1_1-base).
During training, I get samples from the trained model using trainer.sample(text_embeds=text_embeds). I get text_embeds from the abovementioned t5_encode_text using t5-v1_1-base as follows:
Although generator properly generates images of the license plate, it does not take into account the text (in this case, 123나0456) and generates random digits in the license plate. Why the text_embeds does not work properly?
I have trained an Imagen with one unet by changing dataset get_item function, where it returns a transformed image with the text_embedding (the embedding is obtained from t5_encode_text using t5-v1_1-base). During training, I get samples from the trained model using trainer.sample(text_embeds=text_embeds). I get text_embeds from the abovementioned t5_encode_text using t5-v1_1-base as follows:
Although generator properly generates images of the license plate, it does not take into account the text (in this case, 123나0456) and generates random digits in the license plate. Why the text_embeds does not work properly?