Open phyllispeng123 opened 5 months ago
Since the output is a mix of blurry and sharp portions, it looks like the scale of your adversarial loss / gradients is probably too low compared to the reconstruction (MSE/MAE) loss / gradients.
You could try just scaling up the discriminator loss (10 * distance(disc(real).mean(), disc(fake).mean())
or something), or using the automatic scaling from https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L32.
I also posted a simplified example training notebook using only adversarial loss here https://github.com/madebyollin/seraena/blob/main/TAESDXL_Training_Example.ipynb (which also does some automatic gradient scaling); you could try taking the adversarial loss from that and combining it with your reconstruction losses.
Hi, this is Phyllis. Very appreciate your marvelous work on SD-VAE compression and acceleration !!!! I am currently working on Autoencodertiny decoder training from scratch using LDM training structure, but I find that the generated images are not very clear. The training is completely the same as LDM training apart from your extra loss
distance(disc(real).mean(), disc(fake).mean())
in my decoder generator (the extra loss indeed helped with stability and FID ,many thanks!!!! ). I train the decoder using SD1.5 encoder output as my input for190k steps with batch_size=4, lr = 1e-4
, but generated images are still not clear. Would you mind give some hints (loss used, steps for training, any finetune stage? ) on how to align with your TAESD results?this is TAESD result
this is MY result