lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Faulty normalization?! #56

Closed NotNANtoN closed 3 years ago

NotNANtoN commented 3 years ago

Sooooo I was working on returning the image from every train_step. I noticed that, so far, the images were saved using save_image. There, I noticed that the image is calculated using the following lines:

https://github.com/lucidrains/deep-daze/blob/79a991eb952166e3c8118c84422037223461bd7c/deep_daze/deep_daze.py#L382-L383

But the normalize_image, that is used in this line is defined here:

https://github.com/lucidrains/deep-daze/blob/79a991eb952166e3c8118c84422037223461bd7c/deep_daze/deep_daze.py#L39

That means, an image is returned, which is normalized to be used as the input for CLIP, at least as far as I understand. That does not make any sense. Surely, the output of the SIREN net needs to be normalized before extracting the image features to calculate the loss, but that should not happen for the final image.

I'd suggest to simply remove the normalize_image. I'm playing around with this and it seems the generated images are now brighter - which makes sense given that, before, we unnecessarily subtracted 0.34-something per image.

That seems to be a major bug (although I kind of like the darker images too) @lucidrains @afiaka87

NotNANtoN commented 3 years ago

As I mentioned before, I have some changes in the works anyways. If you agree with me I will add this change to others.

NotNANtoN commented 3 years ago

I made a PR to fix this in #58

NotNANtoN commented 3 years ago

The PR was merged, so the issue is fixed. Keep in mind that this drastically changes the generated images - they will now start from a white-greyish image, instead of a blue image and are brighter in general.