huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.98k stars 4.93k forks source link

Combined loss term for VQ-VAE (`diffusers.VQModel`) #7787

Open asy51 opened 2 months ago

asy51 commented 2 months ago

For training the VQ-VAE component of a latent diffusion model a la CompVis/ldm-celebahq-256 (which uses diffusers.VQModel), is there a combined loss term for each of the losses as described by the authors: reconstruction loss, vq loss, and commitment loss?

I see the vq loss term is collected in VectorQuantizer, but it does not seem to be used anywhere else. https://github.com/huggingface/diffusers/blob/ebc99a77aad647c5d33eb36a33c23f7b3949cb40/src/diffusers/models/autoencoders/vae.py#L726-L730

I'm also open to alternatives to VQModel like AutoEncoderKL, if they can collect the loss terms more easily.

Thank you!

asy51 commented 2 months ago

For example, Seq2SeqQuestionAnsweringModelOutput has a loss attribute (https://github.com/huggingface/transformers/blob/9fe3f585bb4ea29f209dc705d269fbe292e1128f/src/transformers/modeling_outputs.py#L1169) which can be used to train transformers.T5... I'm looking for something similar in VQModel, or other VAE for that matter.

bghira commented 2 months ago

yes, in the deep-floyd/IF project we see these; https://github.com/deep-floyd/IF/blob/develop/deepfloyd_if/model/gaussian_diffusion.py#L739 but i can't remember anywhere seeing them in the diffusers project