I don't think this repository supports text-to-image. You would need to add a text encoder component that injects information from your prompt into your backward diffusion process and use a contrastive loss to measure the similarity of the generated images with the respective prompts. I can refer you to this(CLIP) and this(GLIDE) but I believe HuggingFace already provides a training facility for your desired task.
I don't think this repository supports text-to-image. You would need to add a text encoder component that injects information from your prompt into your backward diffusion process and use a contrastive loss to measure the similarity of the generated images with the respective prompts. I can refer you to this(CLIP) and this(GLIDE) but I believe HuggingFace already provides a training facility for your desired task.