On a DGX-A100 with 8 GPU and nvidia-smi 535.154.05, this script has not the expected behaviour.
The script is expecting that nvidia-smi prints only the informations for 5 GPU, but it's not the case anymore:
nvidia-smi dmon -c 1 -s pucvmet | grep -v ^# | sed 's/^ *//' | sed 's/ */,/g' | sed 's/-/0/g'
0,65,32,48,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,6,
1,64,31,44,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,0,
2,65,32,46,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,2,3,
3,62,31,47,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,0,
4,65,36,51,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,5,
5,66,35,49,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,0,
6,67,36,49,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,0,
7,67,35,49,0,0,0,0,0,0,1593,210,0,0,0,1,0,0,0,0,0,0,
The issue the is that the script will start its loop at 5 (6th GPU), adding make some double:
On a DGX-A100 with 8 GPU and nvidia-smi 535.154.05, this script has not the expected behaviour. The script is expecting that nvidia-smi prints only the informations for 5 GPU, but it's not the case anymore:
The issue the is that the script will start its loop at 5 (6th GPU), adding make some double:
My fix intends to fix this by starting the loop at a variable number that is the line number in the first command.