Open Y-ichen opened 1 year ago
Hi,
You can read more info about the outputs here: https://github.com/rinongal/textual_inversion/issues/19, https://github.com/rinongal/textual_inversion/issues/34
tl;dr: The difference is that samples_scaled_gs uses a classifier guidance scale of 5.0, while samples_gs uses a guidance scale of 1.0. You only care about samples_scaled_gs since you're going to use high guidance scales at inference anyhow.
However, If your unscaled samples look too much like your concept, that's a good sign of overfitting.
Also - if you are trying to reproduce - keep in mind the paper predates Stable Diffusion and uses the original LDM. You're not going to get similar results with SD.
Thanks! That's a good explanation! Actually I am using stable diffusion 1.5 and trying to do some experiments with it, I think SD works well. If I want to add my own loss, where in the code should I add it? I am not familiar with pytorch_lightning and have difficult to find where the loss is fixed. The loss I want to add is about the generated image with an gt_image given by myself.
You can add losses inside this function: https://github.com/rinongal/textual_inversion/blob/26ed44fb62c00d6a39d26212a0510466cccebd59/ldm/models/diffusion/ddpm.py#L1053
You'll probably have to understand how to pipe your data into that function, however.
When I was reproducing the results for the paper, I found that files named samplegs and sample_scaledgs are contemporary produced. The results for sample_scaledgs are good: However I don't know what does a samplesgs file means: Therefore I want to know the difference between them. Besides, what makes sample_scaledgs that better than samplegs ? Does the input image set acts as a role for producing sample_scaledgs* ? (for example, is the start_code for diffusion model initialized from one image from the given image set?)