Closed StorywithLove closed 10 months ago
@Programming-Music Thanks for your feedback! I appreciate it. I can certainly work on adding distributed-data-parallel (DDP) to the codebase as a future work. I will make this improvement in my free time. At the meantime, if you want to train the model on multi-GPUs you can easily utilize data-parallel (DP) with just a couple of line of code change, although it would not be as efficient as distributed data parallel (DDP).
Yeah, your answer is very helpful. When I set the gpu_ids to 0,1 and the batch_size is greater than 1 (due to the custom img_size being set to 1024 and the batch_size being often set to 1 to try to run), the model can be successfully run on multiple gpu's!
@Programming-Music Great! Thanks for confirming!
The model and the code given is amazing! It still performs well when migrated to our own dataset. On the downside, the framework doesn't take distributed training into account, and I wonder if the authors have a good way to get it trained on multiple machines, or if it will be updated in a future release!