Open eagles1812 opened 2 years ago
Hi, can you please explain what you means by distributed training component?
Julien
Thanks for your reply. In your code suite, you used one GPU for training, for a large dataset such as yours, it takes very long time. I have multiple GPUs, and wanted to decrease the training time, so I modified your training code to include distributed training component following articles such as https://towardsdatascience.com/how-to-scale-training-on-multiple-gpus-dae1041f49d2, then I met the problem I listed in the previous post. Thanks!
Thanks for the great paper, dataset and code!
I tried to train the model with ready data using single GPU, it took roughly half day. So I tried to add some distributed training component, the training time decreased, but also the AP/AR/IOU values. Have you tested distributed training? How do you correctly set the parameters to ensure shorter training time and proper AP/AR/IOU values?
Thank you!