Fix normalization, VRAM usage, and new story feature

lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

MIT License

4.37k stars 327 forks source link

Removed normalization from save_image -> images appear correctly now and are much brighter
Every train_step now returns the generated image to be more flexible and it is reused in save_image
If an image and a text is used in the Image() instantiation, the average embedding of the two is created -> no need for beginners to play with clip encodings directly
Big: added new option create_story. This will separate a given input text into words and then move over the words in a sliding-window fashion to generate a story of the text - this allows the visualization of stories much longer than 77 chars (although there is no memory beyond the 77 chars) - save_progress should be turned on for this feature such that a GIF can be created in the end

Please give it a try, especially the create_story feature. If you turn on save_progress, nice movies can be made. I set it up such that it optimizes for 1 episode on three words, then for the next episode it adds 3 more words etc. (old words are kicked out if the CLIP context length is reached).

I'm generating some dream stories at the moment. I could update the README too and put a story in there, along with explanations of the new img feature.

Also, as the VRAM issue is now fixed I can (with my 8GB RTX 2060 Super) run a 44 layer net with a batch size of 96 on a resolution of 256. For a 512 resolution I have not yet found a good setup.

lucidrains / deep-daze

Fix normalization, VRAM usage, and new story feature #58