Seeking Clarification on VAE and CRM Performance in CCM Diffusion

thu-ml / CRM

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

MIT License

588 stars 48 forks source link

Outstanding Work! I have a question regarding the performance of the VAE in your CCM Diffusion model. As far as I understand, VAE typically struggle to reconstruct precise CCM. Since the performance of the VAE sets the upper limit for the quality of the CCM Diffusion, it follows that the CCM produced by the diffusion process might not be perfectly accurate.

However, I noticed that the CRM module manages to output accurate meshes. This raises a couple of questions:

Is there a specific trick or method you used during training to enable the CRM to refine and correct the inaccuracies in the CCM?
How does the CRM achieve such high accuracy in the final mesh outputs despite the initial limitations of the VAE?

I would greatly appreciate any insights or details you could provide on these points. Thank you for your time and for sharing your work with the community.

thu-ml / CRM

Seeking Clarification on VAE and CRM Performance in CCM Diffusion #21