wilicc / gpu-burn

Multi-GPU CUDA stress test
BSD 2-Clause "Simplified" License
1.37k stars 295 forks source link

nonuniform loading of multiple gpus #86

Closed kevinmcmahon42 closed 1 year ago

kevinmcmahon42 commented 1 year ago

with nvidia gpu's, I'm noting a full memory full load (100% utilization, full power, 90% memory) on gpu0 but only a 15% memory, full util on gpus 1,2 and 3. Is there anything configured specially for gpu0 that might explain this behavior?

I see the same results running on all 4 GPUs, compared to individual GPU runs. GPU0 uses 90% mem, 100% utiilization, 600+ power, while gpus 1,2,3 use only 15% of mem, 300 W power, and also show 100% utilization.

Thanks

kevinmcmahon42 commented 1 year ago

Interesting. A different system, with similar nodes, shows a more balanced load. Looks like a system config issue for me to track down.

This issue can be closed. Thanks

System 1: unbalanced loading

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    632     74     76    100     80      0      0   2619   1200 
1    272     57     86    100     15      0      0   2619    345 
2    266     69     92    100     11      0      0   2619    172 
3    235     62     84    100     12      0      0   2619    345 

System 2: balanced loading

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    532     66     94    100     75      0      0   2619   1140 
1    622     73     79    100     71      0      0   2619   1260 
2    602     74     89    100     87      0      0   2619   1275 
3    706     72     95    100     81      0      0   2619   1320 

I'm running :

./gpu_burn -m 90% -d -tc 3600 1> gpu_burn.x8000c1s0b1n0.3600s.out.3 2>&1 & tail -f gpu_burn.x8000c1s0b1n0.3600s.out.3

nvidia-smi dmon -s puc \ 1> nvidia-smi_dmon_s_puc.gpu_burn.x8000c1s0b1n0.3600s.out.3 2>&1 & tail -f nvidia-smi_dmon_s_puc.gpu_burn.x8000c1s0b1n0.3600s.out.3

kevinmcmahon42 commented 1 year ago

This issue can be closed