mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

Unable to run unit tests of distributed checkpointing in Megatron-LM #676

Open MingjiHan99 opened 11 months ago

MingjiHan99 commented 11 months ago

dist_checkpointing.config.add_argparse_args does not exist.