togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.43k stars 335 forks source link

Issue on book datasets download #74

Open beccabai opened 8 months ago

beccabai commented 8 months ago

When running the download.py in the current 'book' file, an error occurs:

image

It seems like this is because this dataset is defunct:

image
hicotton02 commented 8 months ago

That is correct. If you look hard on the internet, you can still find it. then you can edit the python script to bring the local file into the fold. I am being vague on purpose. I'm not completely sure what I am allowed to say.

beccabai commented 7 months ago

hahaha...Thank you for your reply!