lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

Pretrained models #26

Open robvanvolt opened 3 years ago

robvanvolt commented 3 years ago

The image generation takes a good amount of time because of the training, as far as I understand.

When the pretrained models are released, how big is the size of the pretrained model, and how long will image generation take then? And how much computing power?

To my understanding, the training of the model usually takes a long time, but with pretrained models the results would be there "in an instant" more or less - or is there more to it?

Best regards

lucidrains commented 3 years ago

@robvanvolt Hello! It is actually not too bad! The model size is reportedly 13B, so actually equivalent to some widely used language models out there. It won't be instantaneous, but it can be made faster through various tricks. It will be vastly less expensive than training of course.

robvanvolt commented 3 years ago

@lucidrains thank you for your quick response!

Will it be possible on pretrained models then to generate images without a CUDA-GPU, e.g. only using an integrated Intel GPU?

And "The model size is reportedly 13B" refers to roughly how many gigabytes of storage space?

batrlatom commented 3 years ago

@robvanvolt probably a number of parameters

powderblock commented 3 years ago

@lucidrains thank you for your quick response!

Will it be possible on pretrained models then to generate images without a CUDA-GPU, e.g. only using an integrated Intel GPU?

And "The model size is reportedly 13B" refers to roughly how many gigabytes of storage space?

Model size refers to the number of parameters in the model. The higher the number, the more data in the dataset and therefore the more accurate the model.

Dall-e operates with about 13 billion parameters which is actually small compared to GPT-3's 175 billion. Just imagine how good Dall-e 2 is gonna be :oooo

robvanvolt commented 3 years ago

I created a repository which sole purpose is to host / collect pretrained models:

https://github.com/robvanvolt/DALLE-models

Here everyone can make their models available, regardless of whether they were trained on a specific or a general dataset. Note that GitHub maximum file size limits the upload of the bigger models, so you need to host it on your own (mega.nz has a free 50GB tier, which is more than enough e.g.). It is a waste of energy if 100 people have to train 100 Dall-E models with the same hyper parameters and the same dataset, so hopefully the collection can give more people access to a broader spectrum of training results!:)

I uploaded an example of my own model here:

https://github.com/robvanvolt/DALLE-models/tree/main/models/taming_transformer/16L_64HD_16H_756I_128T_cc12m_1E