soskek / bookcorpus

Crawl BookCorpus
MIT License
801 stars 110 forks source link

Network Error #3

Closed SummmerSnow closed 5 years ago

SummmerSnow commented 5 years ago

Hi,Thanks for your code, it's really useful for most nlp researchers and thank you again.

And when I run this code, it's often interrupted by network error after download a little files, I thought this maybe caused by my network. so, could you please send me a email attached with the crawled BookCorpus datasets if you have ?

My email is: xiaoxueSummer@gmail.com. Thank you very much.

Best,

abhinavkashyap commented 5 years ago

I have tried using this repository for downloading books.. The size of the file that I have downloaded as more than 70 million sentences. I had the same issues as you. I made a couple of lines of modifications to the script download_list.py. I will make a pull request to this repo. If this is urgent please contact me at abhinavkashyap92 at gmail dot com so I can send you the modified scripts.

SummmerSnow commented 5 years ago

Thanks~ It was urgent so I got this data from the help of a friends. And I found the error is caused by format conversion from epub to txt instead of network. Expect your modified scripts.

Thanks for your reply and all again~

soskek commented 5 years ago

I updated the repository for 503 errors.