tianweiy / DMD2

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Other
448 stars 25 forks source link

Why add the discriminator header on the fake denoiser and not on the real denoiser where the parameters are fixed? #7

Closed IDKiro closed 4 months ago

IDKiro commented 4 months ago

Very very excellent work, I learned a lot from your paper.

I would like to ask a question: Why add the discriminator header on the fake denoiser instead of the real denoiser with fixed parameters, shouldn't the latter be more stable?

tianweiy commented 4 months ago

long story short, we never tried the configuration you proposed.

So we started with a simple ViT discriminator for imagenet experiments. It works well but I was worried that the FID improvement was due to overfitting to classification features so we switched to this diffusiongan formulation.

My original intuition is that using fake denoiser makes the classification easier as the generated images might be out of distribution for the frozen real diffusion model. On the other hand, because it is diffusion gan and because the fake model is also trained for score estimation, it is not easy to collapse.

However, the configuration you proposed might also work. It sounds similar to SD3-turbo's approach.

IDKiro commented 4 months ago

Sounds like a reasonable assumption, I'll try to do a comparative analysis. I hope your paper is accepted by NeurIPS, I don't think it will be difficult for DMD2.

tianweiy commented 4 months ago

just adding that if we change the gan setup a bit, the gan weight might need to be adjusted too and this has large impact to the final performance. For instance, if we do it on the real branch because the real unet is frozen, I am imagining the GAN cls loss for the generator might be smaller, indicating that we might need larger GAN weight to balance with the DMD gradient. Anyway, the idea is that GAN loss weight is an important hyperparameter to adjust based on the exact setup.