lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

Aspiring to go from VQ-VAE -> DALLE on Google Conceptual Captions Dataset #354

Open appliedml42 opened 3 years ago

appliedml42 commented 3 years ago

Hi, I recently read this blog and was fascinated by the potential of these generative models. I am hoping to learn the fundamentals, reimplement models, and reproduce results from scratch. As a first step I found this repository to be VERY helpful. I can use the code here to replicate results as I am getting familiar with the theory. The blogs that have been useful to me so far:

To make it easy for myself I am using the models from here but building my own scaffolding around the code. The repository is here. This is very much a work-in-progress(I work on it when I get time).

At present, I am training the VQ-VAE model on ~2M Google Conceptual Concepts images on a system with 2 Titan RTXs. The training progress can be seen here.

I will kill this training once the images start looking good. Then will move to the DALLE part. The real fun(pain) will start then perhaps.

I will try to keep this ticket updated with progress.

appliedml42 commented 3 years ago

Triggered first DALLE training on my 2 Titan RTXs. Hopefully this learns something interesting. The run progress can be seen here.

Also, upgraded my dataset code to directly download images from the web.

johnpaulbin commented 3 years ago

Hi! Keep us posted, also there are lots of efforts with making a big dataset (crawling@home) to improve your training! -- Also your wandb link seems to lead to a 404 (private?)

appliedml42 commented 3 years ago

@johnpaulbin apologies I decided to delete that run and start a new vanilla one. I was curious to see what happens if there is no LR decay, and no grad norm clipping. Here is the update link that I will keep running for at least next 24 hours.

johnpaulbin commented 3 years ago

@appliedml85 Awesome! I'll keep looking there from time to time.

appliedml42 commented 3 years ago

Based on advice on Discord. Started a new run with these parameters:

Wandb Run: https://wandb.ai/appliedml85/storyteller My scaffolding repo is updated too.

appliedml42 commented 3 years ago

Finally getting some interesting results. https://wandb.ai/appliedml85/storyteller/reports/DALLE-Training-In-Progress--Vmlldzo5Nzc2NjU

appliedml42 commented 3 years ago

Finished first run on DALLE. Here is the report from the run: https://wandb.ai/appliedml85/storyteller/reports/DALLE-Training-On-Google-Conceptual-Concepts-3M--Vmlldzo5Nzc2NjU