Closed chenmoneygithub closed 2 years ago
I would suggest using the "file by Shawn Presser" at the top of the README. That skips running the code to recreate.
Another option would be huggingface datasets, though some work would be needed to get these out of their format.
I'll update the README to include a few different sources, we probably shouldn't try to list just one as there's no official source anymore.
Describe the bug Downloading bookcorpus via the [repo mentioned in BERT instruction]((https://github.com/soskek/bookcorpus/blob/master/README.md) hit an error: HTTPError:
HTTP Error 503: Service Temporarily Unavailable Failed to open https://www.smashwords.com/books/download/459173/6/latest/0/0/imperfect-chemistry.txt
This might be transient since the error code is 503, but we need to further check it.
To Reproduce
Expected behavior Should be able to download bookscorpus dataset.