Closed HUIYINXUE closed 2 years ago
The host of this copy of the dataset (https://the-eye.eu) is down and has been down for a good amount of time (potentially months)
Finding this dataset is a little esoteric, as the original authors took down the official BookCorpus dataset some time ago.
There are community-created versions of BookCorpus, such as the files hosted in the link below. https://battle.shawwn.com/sdb/bookcorpus/
And more discussion here: https://github.com/soskek/bookcorpus
Do we want to remove this dataset entirely? There's a fair argument for this, given that the official BookCorpus dataset was taken down by the authors. If not, perhaps can open a PR with the link to the community-created tar above and updated dataset description.
Hi! The bookcorpusopen
dataset is not working for the same reason as explained in this comment: https://github.com/huggingface/datasets/issues/3504#issuecomment-1004564980
Hi @HUIYINXUE, it should work now that the data owners created a mirror server with all data, and we updated the URL in our library.
Describe the bug
Cannot load 'bookcorpusopen'
Steps to reproduce the bug
or
Actual results
ConnectionError: Couldn't reach https://the-eye.eu/public/AI/pile_preliminary_components/books1.tar.gz
Environment info
datasets
version: 1.9.0