lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Add Fourier Feature Mapping for Improved Quality #59

Closed afiaka87 closed 3 years ago

afiaka87 commented 3 years ago

A pal of mine discovered a fantastic notebook by a coder using the Github handle eps696. The notebook adds fourier feature mapping to CLIP+SIREN to great effect.

Here's the Original Notebook and here's My Custom Version with some fun additions such as prompts to minimize (subtract) and a prompt for painting finer details. All due credit, those ideas are also from eps696 in another of their notebooks.

Anyway, it definitely seems to help. Using as few as 16 SIREN layers, I've gotten this output:

cosmic love and attention https://user-images.githubusercontent.com/3994972/109033636-77a91200-768c-11eb-8d5c-265f745cf496.mp4

mist over green hills vlcsnap-2021-02-24-11h49m43s175

mist over green hills with the fine_details="trees" mist_fine_detail

The drawbacks seem to be that the learning rate is much more fickle and may even need to be changed slightly on a per-phrase basis (if you get really unlucky). Increasing the number of SIREN layers to anything more than 20 causes lots of issues as well and I'm pretty unclear as to why that is. I can't find a stable learning rate for those. Also, there is a fourier_scale parameter which eps696 left at 4, but I've found 2 to be a better result.

NotNANtoN commented 3 years ago

Can you explain what it does, precisely? Looks nice! It would be interesting to compare it to the generations of the new size schedule of #61 and to combine it.

afiaka87 commented 3 years ago

I can not! Unfortunately my expertise in machine learning is quite limited. I just found the notebook and saw the generations were indeed quite nice. Looking at your output though, I think the primary contributor to the quality of the latter image is just that I have the fine_details prompt enabled. It's probably worth looking into implementing that feature as well.

I'll do some comparisons when I get time, but for now I think you've "solved" this issue through other means.