lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

[Discussion] New augmentations #66

Open NotNANtoN opened 3 years ago

NotNANtoN commented 3 years ago

Hey,

since we had the long discussion on the random cutout size sampling in #61 I played around a bit more with it. My issue with the current state of affairs is that for e.g. "A llama wearing a scarf and glasses, sitting in a cozy cafe." there will be many llamas appearing and also the general texture of the llama fur will be placed everywhere. That makes sense: we optimize the SIREN network to maximize the CLIP similarity at randomly sampled squares of sizes down to 10% of the original size. As it is not possible to train without these augmentations (it would only generate a weird-looking adversarial example), I thought of using other augmentations.

I tried adding Gaussian noise to the image - it is better than not adding any augmentations, but not good.

Then I tried sampling around a normal distribution with mean 0.6 and std of 0.2 instead of a uniform distribution between 0.1 and 1.0. The images look quite different - the background looks less interesting as the low-level textures get less weight. But I can't say they definitely look "better".

I tested the prompts: Depression, Consciousness, Schizophrenia, "A psychedelic experience on LSD", Demon, and "A llama wearing a scarf and glasses, reading a book in a cozy cafe.".

The results for the Gaussian:

depression_gauss_fixed Consciousness_gauss_fixed Schizophrenia_gauss_fixed lsd_gauss_fixed demon_gauss_fixed lama_gauss_fixed

Lastly, I tried a simple change. Instead of averaging the loss all random cutouts, I averaged the features of all cutouts to calculate a single loss. Here's where it gets interesting. The generated images look quite different now, often depicting some clear scenes of locations. What is strange, is there are some symbols that appear repeatedly over different prompts - a red and green blob right next to each other and also some kind of company logo appears quite regularly.

These are the results for the current version: depression_uniform consciousness_uniform schizophrenia_uniform lsd_uniform demon_uniform lama_uniform

And these are the results for averaged features: depression_averaged consciousness_averaged schizophrenia_averaged lsd_averaged demon_averaged lama_averaged

I also merged the Gaussian sampling with the feature averaging: depression_gauss_fixed_averaged consciousness_gauss_fixed_averaged schizophrenia_gauss_fixed_averaged lsd_gauss_fixed_averaged demon_gauss_fixed_averaged lama_gauss_fixed_averaged

Let me know what you think or if you have any other ideas! If you can tell me a nice way to put images side-by-side here I can format it a bit better to make it easier to visually compare the results.

I was thinking of potentially including the feature-averaging approach - but I would experiment with averaging the features only of certain sizes. Furthermore, I need to experiment with the feature averaging when choosing different lower_bound_cutout values.

NotNANtoN commented 3 years ago

@dginev @afiaka87, you might be interested in this!

I'm working on this branch atm: https://github.com/NotNANtoN/deep-daze/tree/new_augmentations So far, I only added the feature-averaging as an additional parameter to tweak, the other changes need to be done within the code.

dginev commented 3 years ago
Category depression consciousness schizophrenia lsd demon llama
Gaussian depression_gauss_fixed Consciousness_gauss_fixed Schizophrenia_gauss_fixed lsd_gauss_fixed demon_gauss_fixed lama_gauss_fixed
current version depression_uniform consciousness_uniform schizophrenia_uniform lsd_uniform demon_uniform lama_uniform
averaged features depression_averaged consciousness_averaged schizophrenia_averaged lsd_averaged demon_averaged lama_averaged
Gaussian + feature averaging depression_gauss_fixed_averaged consciousness_gauss_fixed_averaged schizophrenia_gauss_fixed_averaged lsd_gauss_fixed_averaged demon_gauss_fixed_averaged lama_gauss_fixed_averaged

Easy reply first - you can use a markdown table with the markdown images. Or just if you want to fine-tune widths, you can write an HTML table with <img> elements, both work on Github. Here's a quick scrape of your links and adding them into a markdown table.