Closed IDKiro closed 6 months ago
long story short, we never tried the configuration you proposed.
So we started with a simple ViT discriminator for imagenet experiments. It works well but I was worried that the FID improvement was due to overfitting to classification features so we switched to this diffusiongan formulation.
My original intuition is that using fake denoiser makes the classification easier as the generated images might be out of distribution for the frozen real diffusion model. On the other hand, because it is diffusion gan and because the fake model is also trained for score estimation, it is not easy to collapse.
However, the configuration you proposed might also work. It sounds similar to SD3-turbo's approach.
Sounds like a reasonable assumption, I'll try to do a comparative analysis. I hope your paper is accepted by NeurIPS, I don't think it will be difficult for DMD2.
just adding that if we change the gan setup a bit, the gan weight might need to be adjusted too and this has large impact to the final performance. For instance, if we do it on the real branch because the real unet is frozen, I am imagining the GAN cls loss for the generator might be smaller, indicating that we might need larger GAN weight to balance with the DMD gradient. Anyway, the idea is that GAN loss weight is an important hyperparameter to adjust based on the exact setup.
Very very excellent work, I learned a lot from your paper.
I would like to ask a question: Why add the discriminator header on the fake denoiser instead of the real denoiser with fixed parameters, shouldn't the latter be more stable?