Script fixes in data_prep/github

First of all, thank you for your great work to create this project. I didn't have access to a Slurm workload manager, but I was able to use these scripts to preprocess a sample of the GitHub dataset from BigQuery (which was exactly what I wanted to do!). Here are a couple points which would improve the scripts for the next person:

The script scripts/github-prepare-download.sh mentioned in this README.md seems missing from the scripts directory.
The TARGET_DIR variable in the github-global-dedup-slurm.sbatch script should probably be ./data/github/processed_deduped instead of ./data/github_scratch/processed_deduped
Similarly, the TARGET_DIR and DEDUPED_DIR variables in the github-run-filter-slurm.sbatch script should use github instead of github_scratch

Thanks again for your work on this project.

togethercomputer / RedPajama-Data

Script fixes in data_prep/github #25