openai / guided-diffusion

MIT License
6.03k stars 803 forks source link

After training the sample pictures get some weird color tints #81

Open benjamin-bertram opened 1 year ago

benjamin-bertram commented 1 year ago

Sometimes in training i get some weird color schemes in my pictures, while the original data has no tints at all. Is there a reason for it, and how could I avoid it?

Original data is like: 40SMA0013_1Y5_01

And the output is: 8c0b5bd2-c316-4c06-968c-5d4d2b0607b3

smy1999 commented 1 year ago

Have you solved the problem? I've got a same one.

zengxianyu commented 1 year ago

I also found degradation in image quality after finetuning on the same dataset (I'm using LSUN horse 256 resolution)

stsavian commented 1 year ago

I am also facing a similar problem!

osmanmusa commented 1 year ago

Same here ...

stsavian commented 1 year ago

Hello, I have found that by predicting the target (x_o), instead of the noise (epsilon), the phenomenon is dramatically reduced. Have you tried to set predict_x_start to true?

Looking forward for your feedback, Stefano

ONobody commented 1 year ago

@zengxianyu how to finetuning thanks

tobiasbrinker commented 1 year ago

Hi, for me only predicting the mean (instead of the mean+variance) by setting learn_sigma=False solved the problem.

Walleeeda commented 10 months ago

just training longer

sibasmarak commented 10 months ago

@stsavian @Walleeeda

For me, training longer and predict_xstart=True have not solved the problem (I am using the LSUN Church Outdoor dataset). I am training with learn_sigma=False now, although I was keeping it to the last since it is shown in the paper that predicting variance should help.

Update: None of the suggested solutions here work for me; I am getting weird tints always.

Are there any additional tricks to use while sampling from models that have been trained with predict_xstart=True? Currently, the samples are just pitch-black images. Also, it is worth mentioning that the loss $q_0 << q_3$ in this case (which is reversed in the default case of predict_xstart=False).

MitcML commented 10 months ago

same here, I am using LSUN bedroom model

sibasmarak commented 10 months ago

Hi, I have solved the problem (technically, @stsavian's idea, but I will try to put forth my observations).

TL;DR The solution is to predict $x_0$ (predict_xstart=True) along with trying out several hyperparameters (notably, image_size, num_channels, num_head_channels). Also, for me, rescale_learned_sigmas=False worked better.


Some prior context: my custom dataset has black background in samples, with the content being differently coloured (imagine the MNIST dataset, but the numbers are of different colours, and the images have three channels)

The sampling process calls q_posterior_mean(), which requires the $x0$ (or $x{start}$). The default training setting predicts $x_0$ from predicted noise (see here), thus not that accurate (i.e., I observed that from noise, it predicts $x_0$ has a uniform background but cannot predict the exact background colour). However, this default setting might work well with a dataset with enough background diversity when trained for longer steps with the proper hyperparameter settings.

Another training setting (predict_xstart=True) attempts to predict $x_0$ directly instead of noise, hence better at predicting $x_0$ during sampling. However, there might be training instability (model expressivity and NaN loss). For me, it was a complete collapse into complete black samples and no content when I was using incorrect hyperparameter settings.