nv-tlabs / LION

Latent Point Diffusion Models for 3D Shape Generation
Other
735 stars 57 forks source link

Why the reconstruction is so similar to the ground truth even in the early training stage in VAE? #52

Closed OswaldoBornemann closed 12 months ago

OswaldoBornemann commented 1 year ago

Why the reconstruction is so similar to the ground truth even in the early training stage in VAE?

OswaldoBornemann commented 1 year ago

@ZENGXH I know that you referred to something similar in issue here. So, I am very curious about how you evaluate the reconstruction performance in your paper appendix, which is shown in Table 23 and Table 24. Will you add some noise to it? Otherwise, I think that the EMD and CD values will be much lower.

OswaldoBornemann commented 1 year ago

@ZENGXH I also would like to ask another question. It seems that the input_pts in pointflow_datasets.py is the same as the tr_out. So when you define input_pts as noisy_input, it just actually the same as the val_x? I am not sure if am I right. https://github.com/nv-tlabs/LION/blob/ca8129d8c00bb314e30e51992c3abfe002c625d9/trainers/base_trainer.py#L754

OswaldoBornemann commented 1 year ago

@ZENGXH What is the epoch of VAE that you used in the autoencoding experiment? Is that trained VAE be used for the diffusion training?

ZENGXH commented 12 months ago

valuate the reconstruction performance in your paper appendix, which is shown in Table 23 and Table 24.

when we evaluate the reconstruction (we evaluate the last vae ckpt), we sample from the posterior, meaning that it's sampled from N(network_output_mu, network_output_logsigma)

define input_pts as noisy_input,

yes. the noisy_input in regular vae training is the same the input; I name it as noisy because we use different input points in exp Encoder Fine-tuning for Voxel-Conditioned Synthesis and Denoising under paper section 3.1

epoch of vae

I use epoch 8000 for vae training, the last checkpoint is used for diffusion training

OswaldoBornemann commented 12 months ago

Thank you very much.