thu-ml / CRM

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
MIT License
588 stars 48 forks source link

Seeking Clarification on VAE and CRM Performance in CCM Diffusion #21

Open LTT-O opened 6 months ago

LTT-O commented 6 months ago

Outstanding Work! I have a question regarding the performance of the VAE in your CCM Diffusion model. As far as I understand, VAE typically struggle to reconstruct precise CCM. Since the performance of the VAE sets the upper limit for the quality of the CCM Diffusion, it follows that the CCM produced by the diffusion process might not be perfectly accurate.

However, I noticed that the CRM module manages to output accurate meshes. This raises a couple of questions:

  1. Is there a specific trick or method you used during training to enable the CRM to refine and correct the inaccuracies in the CCM?
  2. How does the CRM achieve such high accuracy in the final mesh outputs despite the initial limitations of the VAE?

I would greatly appreciate any insights or details you could provide on these points. Thank you for your time and for sharing your work with the community.

thuwzy commented 5 months ago

Thank you for your interest in our work!

You're correct that Variational Autoencoders (VAEs) often struggle with precise reconstruction, leading to minor numerical disturbances in the outputs. To address this, we employed a specific strategy during both training and testing phases: we introduced small noise perturbations to the input images and CCMs. This approach helps our model become more resilient to minor inconsistencies.