Training convergence problem of shape diffusion model

wyysf-98 / CraftsMan

CraftsMan: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner

https://craftsman3d.github.io/

430 stars 22 forks source link

Training convergence problem of shape diffusion model #19

Closed waterbearbee closed 3 months ago

waterbearbee commented 4 months ago

Thanks for your amazing work!

As you mentioned in the Appendix , "For the conditional Latent Set Diffusion Model (LSDM), we train our model on 32x A800 GPUs with a batch size of 32 per GPU for 7 days."

I am replicating your work, but I have trained the model on 32x A800 GPUs for 1 day and the result is still bad. I would like to ask you how long it takes to train in order to have better results.

Thank you!

jinnan-chen commented 4 months ago

Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?

waterbearbee commented 4 months ago

Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?

Yes. I have checked the obj-mix rendered image, and it looks fine.

waterbearbee commented 3 months ago

Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?

Another phenomenon is that the training MSE loss becomes very small after the first epoch, about 0.08. But the visualization results are still poor when inferencing.

wyysf-98 commented 3 months ago

Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .

jinnan-chen commented 3 months ago

Should VAE sample_posterior be False during Diffusion training?

waterbearbee commented 3 months ago

Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .

Thank you very much for your reply. I've partially solved this problem by using a single image as condition and not introducing camera parameters. In addition, I found that the camera parameters provided in the objaverse mix dataset were inconsistent with those in the code, which is probably the reason.

rfeinman commented 2 months ago

@waterbearbee

+1 for the concern about camera parameters not matching objaverse-mix. Is there any way to disambiguate/clarify this? Seems important for diffusion model training

wyysf-98 commented 2 months ago

@waterbearbee

+1 for the concern about camera parameters not matching objaverse-mix. Is there any way to disambiguate/clarify this? Seems important for diffusion model training

Hi, we did not use the camera parameters in Objaverse-MIX, but only use its geometry part. We rendered the images by ourself with the format as in the provided sample. I will add some clarification on the data generation later

Moondok commented 2 months ago

Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .

Thank you very much for your reply. I've partially solved this problem by using a single image as condition and not introducing camera parameters. In addition, I found that the camera parameters provided in the objaverse mix dataset were inconsistent with those in the code, which is probably the reason.

@waterbearbee @wyysf-98 Hi, I met the similar problem, I am using single image and not introducing camera embedding to train the shape diffusion model. But after about one day's training and the MSE loss decreased to about 0.08, the visual effect and cd loss are still bad.