lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

Pretrained text encoder #445

Open ethancohen123 opened 1 year ago

ethancohen123 commented 1 year ago

Is it possible to use and train dalle with an external ( frozen) text encoder ( as those available in hugging face) ?

ethancohen123 commented 1 year ago

Anyone has an idea about this ? @lucidrains

kingnobro commented 1 year ago

Hi. If you want to use pretrained language model, you are actually using the text embedding of that model.

  1. At first, you can load and save the text embedding layer weight of pretrained models like CLIP and BERT.
  2. Then, you need to replace the text_emb in DALLE __init__ function. Now, instead of using nn.Embedding to create new text embedding, you can use torch.load to load pretrained weight saved in step 1.

Example: link