Closed waterbearbee closed 3 months ago
Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?
Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?
Yes. I have checked the obj-mix rendered image, and it looks fine.
Hi, I have the similar issue, are you using obj-mix rendered image for condtion diffusion training?
Another phenomenon is that the training MSE loss becomes very small after the first epoch, about 0.08. But the visualization results are still poor when inferencing.
Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .
Should VAE sample_posterior be False during Diffusion training?
Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .
Thank you very much for your reply. I've partially solved this problem by using a single image as condition and not introducing camera parameters. In addition, I found that the camera parameters provided in the objaverse mix dataset were inconsistent with those in the code, which is probably the reason.
@waterbearbee
+1 for the concern about camera parameters not matching objaverse-mix. Is there any way to disambiguate/clarify this? Seems important for diffusion model training
@waterbearbee
+1 for the concern about camera parameters not matching objaverse-mix. Is there any way to disambiguate/clarify this? Seems important for diffusion model training
Hi, we did not use the camera parameters in Objaverse-MIX, but only use its geometry part. We rendered the images by ourself with the format as in the provided sample. I will add some clarification on the data generation later
Hi, can you provide more details about the training? Or which config of vae used in training? I think I can take some time to figure out the reason to fix the released config if possible as I did not test the config in detail .
Thank you very much for your reply. I've partially solved this problem by using a single image as condition and not introducing camera parameters. In addition, I found that the camera parameters provided in the objaverse mix dataset were inconsistent with those in the code, which is probably the reason.
@waterbearbee @wyysf-98 Hi, I met the similar problem, I am using single image and not introducing camera embedding to train the shape diffusion model. But after about one day's training and the MSE loss decreased to about 0.08, the visual effect and cd loss are still bad.
Thanks for your amazing work!
As you mentioned in the Appendix , "For the conditional Latent Set Diffusion Model (LSDM), we train our model on 32x A800 GPUs with a batch size of 32 per GPU for 7 days."
I am replicating your work, but I have trained the model on 32x A800 GPUs for 1 day and the result is still bad. I would like to ask you how long it takes to train in order to have better results.
Thank you!