togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.57k stars 350 forks source link

Issue on book datasets download #74

Open beccabai opened 1 year ago

beccabai commented 1 year ago

When running the download.py in the current 'book' file, an error occurs:

image

It seems like this is because this dataset is defunct:

image
hicotton02 commented 1 year ago

That is correct. If you look hard on the internet, you can still find it. then you can edit the python script to bring the local file into the fold. I am being vague on purpose. I'm not completely sure what I am allowed to say.

beccabai commented 1 year ago

hahaha...Thank you for your reply!