lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

"Priming" Learning rate 3e-4 not working for layers greater than 16 #39

Closed afiaka87 closed 3 years ago

afiaka87 commented 3 years ago

We discussed this elsewhere, but just to be rigorous -

As it stands, I think priming only works on about 16-20 layers. Otherwise, the loss gets stuck in the 0.08 range. I found it's able to escape this 0.08 value by lowering the learning rate.

Now what would really be nice is if we found good rates for certain layer counts. In the meantime, I just made it tweakable from the Imagine interface and the CLI. Here's the code -

https://github.com/lucidrains/deep-daze/pull/38

afiaka87 commented 3 years ago

Off topic: It's tough to figure out what SIREN + CLIP will "latch onto", but faces is definitely one of those things. It's very good at taking the existing face of an image and converting it to the face of your description, so long as the face is popular enough at least. Here's worf from star trek on a photo of Mac's mom from the show It's Always Sunny in Philadelphia

Input: macsmom

SIREN representation: macsmom

After 100 iterations of training on "worf": worf_after

Early output(phrase: "worf"):

afiaka87 commented 3 years ago

@lucidrains thanks!