ziqihuangg / Collaborative-Diffusion

[CVPR 2023] Collaborative Diffusion
https://ziqihuangg.github.io/projects/collaborative-diffusion.html
Other
405 stars 31 forks source link

About training time #13

Closed diamond0910 closed 1 year ago

diamond0910 commented 1 year ago

Hi,

Thank you for your nice work.

I would like to know the time required for their training, including vae, uni-modal and dynamic model.

I train the vae model for about 3 hours using 4 gpus. But I still find the sampled image is poor.

image

The recon image looks ok.

image

ziqihuangg commented 1 year ago

Hi, you should sample using the diffusion UNet instead of directly from the VAE latents. The VAE here is mainly for spatial compression purposes. Diffusion UNet is the main one that takes care of the image distribution.

diamond0910 commented 1 year ago

Do you mean these default outputs are useless?

image

ziqihuangg commented 1 year ago

For the VAE here, you should look at reconstructions_gs-xxxxxx to assess the training progress. I would say you don't need to look at samples_gs-xxxxx, but it can give you a sense of what images look like if you directly sample from the VAE latents without using the diffusion UNet.