Open zvadaszi opened 3 years ago
First of all, thank you for your work and for your repo.
Environment:
pytorch 1.5.1 cuda 10.2 cudnn 7.6.5 mmdetection 2.3.0 4xV100 16GB
My config file is based on: vfnet_r50_fpn_mstrain_2x, modified to a custom dataset having large images (2560x1440) and mainly small objects 10-60px
Training with multiple GPUs and samples_per_gpu = 1, workers_per_gpu = 1, train hangs at the beginning with all GPU_Util at 100%.
Training with multiple GPUs, samples_per_gpu = 2, workers_per_gpu = 2 (and smaller image size) train goes well.
Somehow similar to this issue: 2193
Hi @zvadaszi, thank you for your information. I think this bug is most likely related to those issues about ATSS. I have updated the repo according to those fixes. You may try it again.
First of all, thank you for your work and for your repo.
Environment:
pytorch 1.5.1 cuda 10.2 cudnn 7.6.5 mmdetection 2.3.0 4xV100 16GB
My config file is based on: vfnet_r50_fpn_mstrain_2x, modified to a custom dataset having large images (2560x1440) and mainly small objects 10-60px
Training with multiple GPUs and samples_per_gpu = 1, workers_per_gpu = 1, train hangs at the beginning with all GPU_Util at 100%.
Training with multiple GPUs, samples_per_gpu = 2, workers_per_gpu = 2 (and smaller image size) train goes well.
Somehow similar to this issue: 2193