Closed KerwenX closed 1 year ago
I have a similar question. I am training a VQ-VAE using DDP framework with 4 NVIDIA GeForce RTX 3090 GPUs. I am using the entire Shapenet dataset, which includes all classes, but the training time is as long as 40,000 hours. I was wondering what dataset you used and how many epochs you ran the model for when you trained a similar model. Also, how long did the training take?Thanks.
I have a similar question. I am training a VQ-VAE using DDP framework with 4 NVIDIA GeForce RTX 3090 GPUs. I am using the entire Shapenet dataset, which includes all classes, but the training time is as long as 40,000 hours. I was wondering what dataset you used and how many epochs you ran the model for when you trained a similar model. Also, how long did the training take?Thanks.
Thank you for getting back to me, and I apologize for the delay in my response. However, I have not yet started training on this project, so I am unable to provide an answer to your question at this time.
You can monitor through the tensorboard and stops the training if the loss converges or the reconstruction looks decent for you.
It takes around 7-10 days to train the AE, and 10-14 days to train the diffusion models on one GPU. It will converge faster with multiple GPUs with DDP.
Thanks a lot!
Hello @yccyenchicheng,
I apologize for the inconvenience and hope this message finds you well. I find your research to be highly intriguing and I have some inquiries regarding the experimental details.
Specifically, I would like to inquire about the device(s) used when training the
Sdfusion-mm2shape
. Additionally, I am curious to know how long it took to complete the training process.Furthermore, I would like to kindly request if you could consider releasing the training code for
Sdfusion-mm2shape
as soon as possible.Thank you very much for your time and consideration. I appreciate any assistance you can provide!
Best regards