microsoft / VQ-Diffusion

Official implementation of VQ-Diffusion
MIT License
877 stars 62 forks source link

Difference from LDM #35

Open Tsingularity opened 1 year ago

Tsingularity commented 1 year ago

Hi, thanks for the great work!

I just noticed that your paper is actually a concurrent work with LDM (exactly the same conference publication!), just wondering what's the main difference between these two works in terms of method? (I took a quick pass but seems that these two papers proposed basically the same technique?)

Thanks!

createrfang commented 1 year ago

I am currently learning about both works as well. They are indeed quite similar, but there are also differences. You may notice that LDM (Latent Diffusion) employs a cross-attention mechanism in the context of UNet, whereas Ada-IN (Adaptive Instance Normalization) is used here.

Additionally, VQ-diffusion appears to modify the forward diffusion process and inherits a self-autoregressive mechanism similar to PixelCNN from VQVAE.