Closed SummmerSnow closed 5 years ago
I have tried using this repository for downloading books.. The size of the file that I have downloaded as more than 70 million sentences. I had the same issues as you. I made a couple of lines of modifications to the script download_list.py
. I will make a pull request to this repo. If this is urgent please contact me at abhinavkashyap92 at gmail dot com so I can send you the modified scripts.
Thanks~ It was urgent so I got this data from the help of a friends. And I found the error is caused by format conversion from epub to txt instead of network. Expect your modified scripts.
Thanks for your reply and all again~
I updated the repository for 503 errors.
Hi,Thanks for your code, it's really useful for most nlp researchers and thank you again.
And when I run this code, it's often interrupted by network error after download a little files, I thought this maybe caused by my network. so, could you please send me a email attached with the crawled BookCorpus datasets if you have ?
My email is: xiaoxueSummer@gmail.com. Thank you very much.
Best,