lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 326 forks source link

Img encoding #48

Closed NotNANtoN closed 3 years ago

NotNANtoN commented 3 years ago

Added:

Please feel free to tell me what things to change or to adapt it as you like. I just thought other people would appreciate these features too.

NotNANtoN commented 3 years ago

I played around with the following image:

hot-dog

I used it as the "img" input to the Imagine model to generate this (does not look amazing, but it works):

https://user-images.githubusercontent.com/19983153/107889810-638b4500-6f15-11eb-9871-2f3f4050eb44.mp4

I used the create_img_encoding and create_text_encoding function from Imagine to get the encoding for the hot-dog image and the sentence "Yellow" and took the average of them. I fed this encoding in Imagine to generate this:

https://user-images.githubusercontent.com/19983153/107889841-93d2e380-6f15-11eb-9895-8269f3d39040.mp4

And this with "Pink":

https://user-images.githubusercontent.com/19983153/107889860-abaa6780-6f15-11eb-8445-ea18db181680.mp4

And with something more abstract "Love is the answer!":

https://user-images.githubusercontent.com/19983153/107889911-19569380-6f16-11eb-8ede-14cc603aa740.mp4

afiaka87 commented 3 years ago

Edit: I stand corrected. This is pretty cool!.

@NotNANtoN i'm not sure if you're aware of this, but I believe this feature is already implemented. Although the saving and manipulating of CLIP embeds is cool stuff!

afiaka87 commented 3 years ago

Oh i see you've added quite a few more knobs to turn than the original implementation. Apologies.

lucidrains commented 3 years ago

@NotNANtoN looks great! thank you for the contribution :)