soskek / bookcorpus

Crawl BookCorpus
MIT License
812 stars 110 forks source link

Update on the `url_list.jsonl` #29

Open thipokKub opened 1 year ago

thipokKub commented 1 year ago

Hello, on 2022-12-17 I run the script download_list.py with modified number to page to 31430 which covered the last search page. Here is the updated url_list.jsonl.zip

There are 4544 entries loss, and 8475 entries added from the original file

Hope this help