lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
8k stars 757 forks source link

Unconditional Imagen #304

Closed bekhzod-olimov closed 1 year ago

bekhzod-olimov commented 1 year ago

What happens during unconditional Imagen? (condition_on_text = False) Does the model change to image2image model when there is no dependence on text? If so, how can we use Imagen during inference, when we cannot input text (because the model is trained with condition_on_text = False)?

lucidrains commented 1 year ago

@bekhzod-olimov just call sample method without passing in text

bekhzod-olimov commented 1 year ago

@lucidrains Thank you for the reply! I tried to do so but could not get the desired output. I am working on generating plate numbers (I have texts and images). I could not use texts because the text embedding model (t5-small) did not fit my GPU (40GB memory). So I used images only with condition_on_text = False. After finishing training I called sample, which generated plate with random numbers (quality of the image was good enough though).

Is there any way to get meaningful (with desired numbers) generated images when trained with condition_on_text = False? Or perhaps there is a smaller text embedding model that fit my GPU with 40GB memory?

lucidrains commented 1 year ago

@bekhzod-olimov you can preencode the text into text embeddings, and then pass those in during training through the keyword text_embed. that would save you a trip to the T5 transformer