lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

confusion #29

Open skywo1f opened 3 years ago

skywo1f commented 3 years ago

Forgive my confusion. Is there a guide on how to use this? I am used to systems like darknet where I have images and annotations, then I train on them. Finally I have a cfg and weights file which I can use to find examples in an image.

I ran both scripts (train DALL-E and train VAE) but nothing has changed in my folder. How do I get from here to the point where I can give it a sentence and have it produce an image for me?

powderblock commented 3 years ago

+1

Am also confused on how to use this. Even with existing data... How do I enter input and get a visual output?

powderblock commented 3 years ago

@lucidrains any help on this one? Would love to get started but confused just like this other guy :)

johndpope commented 3 years ago

It seems @deepglugs has got somewhere with his own training https://github.com/deepglugs/dalle / there's specific helpful training instructions (requires tags / image foo.txt foo.png ) tags file format. (should be a pull request) maybe @lucidrains this is helpful?

python3 dalle.py --source path/to/images/and/tags/ \
                 --vocab curated_512.vocab \
                 --vae vae.pt \
                 --train_dalle \
                 --dalle dalle.pt \
                 --batch_size=16 \
                 --samples_out samples/dalle/ \
                 --epochs=2

There's a ticket related to training sets - https://github.com/lucidrains/DALLE-pytorch/issues/7

Looking at @josephcappadona sample 50mb file - there's the following format. 1166-004-E9B61A49.jpg

1166-004-E9B61A49

1166-004-E9B61A49.txt The basic design of Hawaii's state seal has been in use since 1894, but the legend now reads “State of Hawaii” rather than “Republic of Hawaii.” The Hawaiian coat of arms is supported by Kamehameha I and the goddess of liberty, with a rising sun behind. The motto “Ua Mau ke Ea o ka Aina i ka Pono” (The Life of the Land Is Perpetuated in Righteousness) is along the bottom edge. Below the shield are various symbols: a phoenix rising from flames, taro leaves, banana foliage, and maidenhair ferns.

I'm thinking the tag format by @deepglugs may yield better results initially on smaller datasets?? 1girl, white_swimsuit, red_hair There seems to be some guidance around vocab tokens (.vocab file) ~ 512 tags which would help things converge, right?

I'd also consider we could use an google /aws image recognition software to find tags on a large image dataset.