Img encoding - Githubissues

NotNANtoN commented 3 years ago

Added:

an img can not be fed into Imagine (can be PIL Image or the path to a file that can be loaded with PIL.open. I added a transform pipeline that just resizes, takes the center crop and normalizes)
a custom encoding can be used too (so users can do their embedding arithmetic and just feed in the CLIP feature vector)
made the text in Imagine optional (this change might break some stuff in the CLI - we could also change this to be non-optional and just set it to None when not needed)
added a setter such that the encoding of Imagine can be adjusted on the fly. See set_clip_encoding of Imagine (takes a text, img, or encoding as an input)
adjusted the naming scheme for textpath, created a create_text_path() function that merges the name of the input image and the given text.

Please feel free to tell me what things to change or to adapt it as you like. I just thought other people would appreciate these features too.

NotNANtoN commented 3 years ago

I played around with the following image:

hot-dog

I used it as the "img" input to the Imagine model to generate this (does not look amazing, but it works):

https://user-images.githubusercontent.com/19983153/107889810-638b4500-6f15-11eb-9871-2f3f4050eb44.mp4

I used the create_img_encoding and create_text_encoding function from Imagine to get the encoding for the hot-dog image and the sentence "Yellow" and took the average of them. I fed this encoding in Imagine to generate this:

https://user-images.githubusercontent.com/19983153/107889841-93d2e380-6f15-11eb-9895-8269f3d39040.mp4

And this with "Pink":

https://user-images.githubusercontent.com/19983153/107889860-abaa6780-6f15-11eb-8445-ea18db181680.mp4

And with something more abstract "Love is the answer!":

https://user-images.githubusercontent.com/19983153/107889911-19569380-6f16-11eb-8ede-14cc603aa740.mp4

afiaka87 commented 3 years ago

Edit: I stand corrected. This is pretty cool!.

@NotNANtoN i'm not sure if you're aware of this, but I believe this feature is already implemented. Although the saving and manipulating of CLIP embeds is cool stuff!

afiaka87 commented 3 years ago

Oh i see you've added quite a few more knobs to turn than the original implementation. Apologies.

lucidrains commented 3 years ago

@NotNANtoN looks great! thank you for the contribution :)

lucidrains / deep-daze

Img encoding #48