Is it possible to obtain the image domain embedding converted from learned * ?

Hello @rinongal , thanks for the great work! Just finished my concept training and the result looks good for me.

(Edited) To my understanding, the stable diffusion uses CLIP convert the text prompt to image domain embedding to inject the information into model and ldm uses BERT. If we are using "A photo of *" as prompt, and I want to get the embedding. I just need to send the prompt to BertEmbedder (in /ldm/modules/encoders/module.py) for converting to text embedding, right? If so, will the text embedding be used as a conditional input and be cross-attended by model? Or will it be further projected to image domain like what CLIP do?

Another quick question, is the BertTokenizer used in your work?

rinongal / textual_inversion

Is it possible to obtain the image domain embedding converted from learned * ? #148