rinongal / textual_inversion

MIT License
2.9k stars 279 forks source link

Is it possible to obtain the image domain embedding converted from learned * ? #148

Closed kaneyxx closed 1 year ago

kaneyxx commented 1 year ago

Hello @rinongal , thanks for the great work! Just finished my concept training and the result looks good for me.

(Edited) To my understanding, the stable diffusion uses CLIP convert the text prompt to image domain embedding to inject the information into model and ldm uses BERT. If we are using "A photo of *" as prompt, and I want to get the embedding. I just need to send the prompt to BertEmbedder (in /ldm/modules/encoders/module.py) for converting to text embedding, right? If so, will the text embedding be used as a conditional input and be cross-attended by model? Or will it be further projected to image domain like what CLIP do?

Another quick question, is the BertTokenizer used in your work?

kaneyxx commented 1 year ago

(Reply by myself)

Actually we can get the converted embedding from desired pretrained checkpoint. First, loaded the model as what described in the beginning of txt2img.py. Then, run model.get_learned_conditioning("prompt") It will return the embedding that will be used in inference as condition.