mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k stars 561 forks source link

Running Llama training on GPU #775

Closed mahmoodn closed 3 weeks ago

mahmoodn commented 3 weeks ago

Is there any option to see verbose output for Llama training? I ran the script on a single A100 device and after one hour, I didn't see any process on the GPU via nvidia-smi. The CPU was busy though. I am curious to know if that is a preprocess or the offloading on GPU starts quickly after the launch. As the GPU usage is costly, I want to be sure that I am on the right track. Any idea about that?