...
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for RAM : 0.001575 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for all GPUs : 0.000324 kWh. Total GPU Power : 77.71202001205326 W
[codecarbon INFO @ 18:32:33] Energy consumed for all CPUs : 0.000469 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:33] 0.002369 kWh of electricity used since the beginning.
[2024-06-28 18:32:35,231][__main__][INFO] - Train Epoch: [0][0/2503] Time 8.592 (8.592) Loss 6.9893 (6.9893) Prec@1 0.195 (0.195) Prec@5 0.781 (0.781) LR 0.000
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
[2024-06-28 18:32:42,599][__main__][INFO] - Train Epoch: [0][10/2503] Time 0.737 (1.451) Loss 6.9683 (6.9871) Prec@1 0.000 (0.107) Prec@5 0.391 (0.408) LR 0.002
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for RAM : 0.003147 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for all GPUs : 0.001271 kWh. Total GPU Power : 227.225096755562 W
[codecarbon INFO @ 18:32:48] Energy consumed for all CPUs : 0.000938 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:48] 0.005356 kWh of electricity used since the beginning.
[2024-06-28 18:32:49,984][__main__][INFO] - Train Epoch: [0][20/2503] Time 0.740 (1.112) Loss 6.9804 (6.9801) Prec@1 0.195 (0.121) Prec@5 0.391 (0.409) LR 0.003
[2024-06-28 18:32:57,397][__main__][INFO] - Train Epoch: [0][30/2503] Time 0.745 (0.992) Loss 6.9760 (6.9813) Prec@1 0.000 (0.113) Prec@5 0.391 (0.403) LR 0.005
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for RAM : 0.004720 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for all GPUs : 0.002218 kWh. Total GPU Power : 227.31828920761552 W
[codecarbon INFO @ 18:33:03] Energy consumed for all CPUs : 0.001407 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:33:03] 0.008345 kWh of electricity used since the beginning.
[2024-06-28 18:33:04,848][__main__][INFO] - Train Epoch: [0][40/2503] Time 0.745 (0.932) Loss 7.0290 (6.9850) Prec@1 0.195 (0.119) Prec@5 0.195 (0.405) LR 0.007
[2024-06-28 18:33:12,488][__main__][INFO] - Train Epoch: [0][50/2503] Time 0.738 (0.899) Loss 6.9633 (6.9860) Prec@1 0.000 (0.119) Prec@5 0.391 (0.406) LR 0.008
...
Why does it print the scontrol error muliple times while training? It also prints the "Energy consumed" messages all the time
Description
I got these prints:
Why does it print the scontrol error muliple times while training? It also prints the "Energy consumed" messages all the time
What I Did
I just wanted to run this code:
the make command is from this github: poloclub/robust-principles