togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.53k stars 346 forks source link

No file named github-prepare-local-dedup.sh #53

Closed feverdreamy closed 1 year ago

feverdreamy commented 1 year ago

Hey, I've been trying to process and clean the dataset for awhile now and I keep getting this error and can't seem to find the file in the repo or anything. Any help would be greatly appreciated. Here is the error message - bash: scripts/github-prepare-local-dedup.sh: No such file or directory

mauriceweber commented 1 year ago

Hi @feverdreamy thanks for bringing this up! This is a typo in the readme and should be bash scripts/github-prepare-clean.sh. I just fixed it in d174968.