Closed sahilqure closed 5 months ago
In the generic diffusion-based text-to-image generation model, the VAE is typically trained independently with the addition of reconstruction loss and GAN loss. Currently, we do not have the corresponding implementation or configuration files, and contributions are welcome.
I understand that full finetuning sdxl without the first stage on natural images works precisely but for images of other domains like (CT, MRI, and stuff), first-stage training of autoencoder is required.