Open anastasia-spb opened 2 months ago
Hello, I have encountered the same problem as https://github.com/open-mmlab/mmdetection/issues/10761.
I am launching the following script:
./mmdetection/tools/dist_train.sh ./mmdetection/configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py 4
Conda env summary:
Train batch size: 20
Hardware setup: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 6 CPU max MHz: 3600,0000 CPU min MHz: 800,0000 BogoMIPS: 6000.00
Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 1,1 MiB (24 instances) L1i: 768 KiB (24 instances) L2: 30 MiB (24 instances) L3: 36 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-11,24-35 NUMA node1 CPU(s): 12-23,36-47 Vulnerabilities: Gather data sampling: Mitigation; Microcode Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Srbds: Not affected Tsx async abort: Not affected
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
4 GPU NVIDIA RTX 6000 Ada Generation 49140MiB Driver Version: 535.104.05 CUDA Driver Version: 12.2
The less workers I use, the faster training goes and GPU utilization is more stable.
With many workers:
With only 2 workers:
Using NVIDIA Nsight Systems profiler I see that many CPUs are just not utilized.
I have conducted the same experiment on another hardware setup and increasing number of workers also increase the train speed.
Could you give any advice? Shall I update any drivers?
Hello, I have encountered the same problem as https://github.com/open-mmlab/mmdetection/issues/10761.
I am launching the following script:
Conda env summary:
Train batch size: 20
Hardware setup: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 6 CPU max MHz: 3600,0000 CPU min MHz: 800,0000 BogoMIPS: 6000.00
Virtualization features: Virtualization: VT-x Caches (sum of all):
L1d: 1,1 MiB (24 instances) L1i: 768 KiB (24 instances) L2: 30 MiB (24 instances) L3: 36 MiB (2 instances) NUMA:
NUMA node(s): 2 NUMA node0 CPU(s): 0-11,24-35 NUMA node1 CPU(s): 12-23,36-47 Vulnerabilities:
Gather data sampling: Mitigation; Microcode Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Srbds: Not affected Tsx async abort: Not affected
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
4 GPU NVIDIA RTX 6000 Ada Generation 49140MiB Driver Version: 535.104.05 CUDA Driver Version: 12.2
The less workers I use, the faster training goes and GPU utilization is more stable.
With many workers:![Screenshot from 2024-05-03 14-42-20](https://github.com/open-mmlab/mmdetection/assets/45384777/785041cd-9b72-42cd-874e-43ff20d0d905)
With only 2 workers:![Screenshot from 2024-05-03 15-31-08](https://github.com/open-mmlab/mmdetection/assets/45384777/56a88644-7d33-4518-8bc4-3fac5ce8e1b9)
Using NVIDIA Nsight Systems profiler I see that many CPUs are just not utilized.
I have conducted the same experiment on another hardware setup and increasing number of workers also increase the train speed.
Could you give any advice? Shall I update any drivers?