Shrinking Llama training to suite one GPU

mlcommons / training

Reference implementations of MLPerf™ training benchmarks

https://mlcommons.org/en/groups/training

Apache License 2.0

1.62k stars 560 forks source link

Shrinking Llama training to suite one GPU #777

Open mahmoodn opened 1 week ago

mahmoodn commented 1 week ago

Hi, Is it possible to run Llama training on 1 GPU for a test? I have tested with smaller sequence length and batch size of 1, but it seems that due to using Deepspeed in distributed_type: DEEPSPEED, it has to be a multi-node configuration. I can not find any other option other than DEEPSPEED.

Any idea about that?