togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.57k stars 350 forks source link

Single machine download script and downloaded files check #60

Open MIracleyin opened 1 year ago

MIracleyin commented 1 year ago
  1. a script that could download files without slurm.
  2. checking downloaded files if the download failed, saving AWS bandwidth cost.