sihyun-yu / PVDM

Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
https://sihyun.me/PVDM
MIT License
287 stars 15 forks source link

Question about the autoencoder design #38

Closed Darius-H closed 2 weeks ago

Darius-H commented 1 month ago

Q1: As latent diffusion uses VAE, why did you modify the structure to autoencoder, is it because of poor VAE performance?

Q2: Why design a bottleneck structure here? https://github.com/sihyun-yu/PVDM/blob/17699659148423469c0d1ccdca5e466933b943e1/models/autoencoder/autoencoder_vit.py#L180C1-L190C34

meetcfd commented 2 weeks ago

Hello! Were you able to understand the author's motivation to use Transformer based autoencoder?

Thanks in advance!

Darius-H commented 2 weeks ago

Hello! Were you able to understand the author's motivation to use Transformer based autoencoder?

Thanks in advance!

I tested this autoencoder, it works really bad, the reconstruction quality is very low.