mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

Llama2 - LoRA Reference Implementation #727

Open rgandikota opened 2 months ago

rgandikota commented 2 months ago
  1. The Readme points to using a eval.py, which is missing from the llama2 scripts folder.
  2. Instructions to run this reference implementation on multiple nodes would be helpful for anyone looking to use it as-is.
  3. Would be helpful to new submitters if the training time for the reference run is documented.
itayhubara commented 2 months ago
  1. I'll delete it - it is just confusing
  2. @michal2409 can you open a PR to update the README with multi-node
  3. I can add one log but we don't usually do that - @nv-rborkar what do you think
rgandikota commented 1 month ago

@itayhubara Could you please let us know if we can use this dataset from hugging face instead of the parquet files from the Google Drive? From the ReadMe instructions, looks like this is the dataset the parquet files https://huggingface.co/datasets/tau/scrolls/blob/main/gov_report.zip