Open NotNANtoN opened 3 years ago
@dginev @afiaka87, you might be interested in this!
I'm working on this branch atm: https://github.com/NotNANtoN/deep-daze/tree/new_augmentations So far, I only added the feature-averaging as an additional parameter to tweak, the other changes need to be done within the code.
Category | depression | consciousness | schizophrenia | lsd | demon | llama |
---|---|---|---|---|---|---|
Gaussian | ||||||
current version | ||||||
averaged features | ||||||
Gaussian + feature averaging |
Easy reply first - you can use a markdown table with the markdown images. Or just if you want to fine-tune widths, you can write an HTML table with <img>
elements, both work on Github. Here's a quick scrape of your links and adding them into a markdown table.
Hey,
since we had the long discussion on the random cutout size sampling in #61 I played around a bit more with it. My issue with the current state of affairs is that for e.g. "A llama wearing a scarf and glasses, sitting in a cozy cafe." there will be many llamas appearing and also the general texture of the llama fur will be placed everywhere. That makes sense: we optimize the SIREN network to maximize the CLIP similarity at randomly sampled squares of sizes down to 10% of the original size. As it is not possible to train without these augmentations (it would only generate a weird-looking adversarial example), I thought of using other augmentations.
I tried adding Gaussian noise to the image - it is better than not adding any augmentations, but not good.
Then I tried sampling around a normal distribution with mean 0.6 and std of 0.2 instead of a uniform distribution between 0.1 and 1.0. The images look quite different - the background looks less interesting as the low-level textures get less weight. But I can't say they definitely look "better".
I tested the prompts: Depression, Consciousness, Schizophrenia, "A psychedelic experience on LSD", Demon, and "A llama wearing a scarf and glasses, reading a book in a cozy cafe.".
The results for the Gaussian:
Lastly, I tried a simple change. Instead of averaging the loss all random cutouts, I averaged the features of all cutouts to calculate a single loss. Here's where it gets interesting. The generated images look quite different now, often depicting some clear scenes of locations. What is strange, is there are some symbols that appear repeatedly over different prompts - a red and green blob right next to each other and also some kind of company logo appears quite regularly.
These are the results for the current version:
And these are the results for averaged features:
I also merged the Gaussian sampling with the feature averaging:
Let me know what you think or if you have any other ideas! If you can tell me a nice way to put images side-by-side here I can format it a bit better to make it easier to visually compare the results.
I was thinking of potentially including the feature-averaging approach - but I would experiment with averaging the features only of certain sizes. Furthermore, I need to experiment with the feature averaging when choosing different lower_bound_cutout values.