zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023 Spotlight)
Other
823 stars 42 forks source link

Dear Author, have you tried ResShift on a VAE instead of VQGAN? #57

Open toummHus opened 4 months ago

toummHus commented 4 months ago

ResShift is highly similar with Latent Diffusion Model, so i wonder did you try a VAE to construct the latent space? Did it work well? Why chose VQGAN? Looking forward to your reply!

zsyOAOA commented 4 months ago

Strictly, I used the autoencoder model (KL divergence, not codebook) within Latent Diffusion Model. The autoencoders with KL or codebook both work well.

toummHus commented 4 months ago

Strictly, I used the autoencoder model (KL divergence, not codebook) within Latent Diffusion Model. The autoencoders with KL or codebook both work well.

Thanks for reply. So you choose VQGAN because it's better? Which one do you recommend?

zsyOAOA commented 4 months ago

No, I just follow the SR model in the paper of the latent diffusion model. I recommend the one with KL, since most of recent stable diffusion models all employ such a choice.

Feynman1999 commented 3 months ago

Strictly, I used the autoencoder model (KL divergence, not codebook) within Latent Diffusion Model. The autoencoders with KL or codebook both work w

Is this vae the same as sd1.5 version, but the vae used by sd2.1 and sdxl has changed?

louis4work commented 2 months ago

Strictly, I used the autoencoder model (KL divergence, not codebook) within Latent Diffusion Model. The autoencoders with KL or codebook both work well.

From my understanding, all the released models in this repo are using VQVAE, such as: https://github.com/zsyOAOA/ResShift/blob/989803abe8315c7e8c17d7d4b2c6541dc6e763f9/configs/realsr_swinunet_realesrgan256_journal.yaml#L4-L6 https://github.com/zsyOAOA/ResShift/blob/989803abe8315c7e8c17d7d4b2c6541dc6e763f9/ldm/models/autoencoder.py#L12-L50

So @zsyOAOA would you mind clarifing one more time about which type of the autoencoder on earth was used in the paper? Thanks!

zsyOAOA commented 2 months ago

I extract the VQGAN checkpoint from the latent diffusion model for image super-resolution, accessible from this link. I rewrite the model class with pytorch, i.e., VQModelTorch, as the original model is based pytorch-lightening. @louis4work

louis4work commented 2 months ago

I extract the VQGAN checkpoint from the latent diffusion model for image super-resolution, accessible from this link. I rewrite the model class with pytorch, i.e., VQModelTorch, as the original model is based pytorch-lightening. @louis4work

Thanks!

maxmars1 commented 1 month ago

I extract the VQGAN checkpoint from the latent diffusion model for image super-resolution, accessible from this link. I rewrite the model class with pytorch, i.e., VQModelTorch, as the original model is based pytorch-lightening. @louis4work

hello. you rewrote VQModelTorch from pytorch-lightning and did you rewrite all latent diffusion code from pytorch-lightning to pytorch and start over producing pretrained autoencoder model ?

zsyOAOA commented 1 month ago

No. I do not reproduce the autoencoder model. In addition, the codebase of diffusion model I used is based on the guided diffusion, which is not written using pytorch-lightning. @maxmars1

maxmars1 commented 1 month ago

I successfully extracted and reloaded VQGAN autoencoder model (pytorch-lightning ) to ResShift. Thank you so much. @zsyOAOA