Closed kevinmcmahon42 closed 1 year ago
Interesting. A different system, with similar nodes, shows a more balanced load. Looks like a system config issue for me to track down.
This issue can be closed. Thanks
System 1: unbalanced loading
0 632 74 76 100 80 0 0 2619 1200
1 272 57 86 100 15 0 0 2619 345
2 266 69 92 100 11 0 0 2619 172
3 235 62 84 100 12 0 0 2619 345
System 2: balanced loading
0 532 66 94 100 75 0 0 2619 1140
1 622 73 79 100 71 0 0 2619 1260
2 602 74 89 100 87 0 0 2619 1275
3 706 72 95 100 81 0 0 2619 1320
I'm running :
./gpu_burn -m 90% -d -tc 3600 1> gpu_burn.x8000c1s0b1n0.3600s.out.3 2>&1 & tail -f gpu_burn.x8000c1s0b1n0.3600s.out.3
nvidia-smi dmon -s puc \ 1> nvidia-smi_dmon_s_puc.gpu_burn.x8000c1s0b1n0.3600s.out.3 2>&1 & tail -f nvidia-smi_dmon_s_puc.gpu_burn.x8000c1s0b1n0.3600s.out.3
This issue can be closed
with nvidia gpu's, I'm noting a full memory full load (100% utilization, full power, 90% memory) on gpu0 but only a 15% memory, full util on gpus 1,2 and 3. Is there anything configured specially for gpu0 that might explain this behavior?
I see the same results running on all 4 GPUs, compared to individual GPU runs. GPU0 uses 90% mem, 100% utiilization, 600+ power, while gpus 1,2,3 use only 15% of mem, 300 W power, and also show 100% utilization.
Thanks