pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.72k stars 22.28k forks source link

What changes we need to make in metrics calculation and visualization part when we use Distributed Data Parallel for distributed training #60343

Open Bilal-Yousaf opened 3 years ago

Bilal-Yousaf commented 3 years ago

❓ Questions and Help

I am updating my training script to use Distributed Data Parallel to do Multi-GPU training. I am done with most of the steps as mentioned in PyTorch Guidelines. But I am confused about how to handle metrics calculation and visualization: For example, I need to calculate accuracy and I have 4 samples in total and 2 GPUs. When I run testing I will have predictions and ground truths for 2 2 samples in each process. Now if I want to calculate accuracy do I need to call dist.reduce or it is not needed and I can directly calculate accuracy in rank 0 process. And what if I am storing info for each sample in each process separately in a dictionary and at end of passing through all samples I need to use that dictionary with info from all samples to create a combined visualization of all 4 samples. How to reduce that dictionary to have information from all processes and generate final visualization on rank 0.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23

gcramer23 commented 3 years ago

@Bilal-Yousaf I think I answered your question on the forums https://discuss.pytorch.org/t/what-changes-we-need-to-make-in-metrics-calculation-when-we-are-using-distributed-data-parallel-for-multi-gpu-training/124584. Do you still need help?

Bilal-Yousaf commented 3 years ago

@Bilal-Yousaf I think I answered your question on the forums https://discuss.pytorch.org/t/what-changes-we-need-to-make-in-metrics-calculation-when-we-are-using-distributed-data-parallel-for-multi-gpu-training/124584. Do you still need help?

Thanks a lot for the quick reply. It was really helpful. I have understood how to use reduce functions to solve the issue and I can get correct metrics as I was getting for single GPU.

Could you please share if there is any way I can gather dictionaries created on multiple processes for multi-gpu on rank 0 process as for now I think all such reduce/gather functions work for only tensors?