issues
search
soskek
/
bookcorpus
Crawl BookCorpus
MIT License
813
stars
110
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Books3 Links are Dead
#30
fruitymedley
opened
1 year ago
1
Update on the `url_list.jsonl`
#29
thipokKub
opened
1 year ago
0
How to create train, test, dev splits?
#28
kgarg8
closed
3 years ago
1
Here’s a download link for all of bookcorpus as of Sept 2020
#27
shawwn
opened
4 years ago
27
epub2txt.py produces incorrect results for many epubs
#26
shawwn
opened
4 years ago
1
Update README.md
#25
soskek
closed
4 years ago
0
Can anyone download all the files in the url list file?
#24
wxp16
opened
5 years ago
13
Could you share the processed all.txt?
#23
thudzj
closed
5 years ago
9
smashwords.com forbids this; readme should tell people to get permission first
#22
gthb
closed
5 years ago
1
How to resolve URLError SSL: CERTIFICATE_VERIFY_FAILED
#21
delzac
closed
5 years ago
1
add: utf8 encoding for all file opens
#20
YongWookHa
closed
5 years ago
1
File names incorrect when epub is missing
#19
1227505
closed
5 years ago
3
add strip for genre scraping
#18
soskek
closed
5 years ago
0
Fix titles of scraped targets
#17
soskek
closed
5 years ago
0
download_list.py not working due to title change.
#16
1227505
closed
5 years ago
1
Sort by author
#15
bakszero
closed
5 years ago
2
add fast download
#14
antihenchman
closed
5 years ago
0
fix readme
#13
soskek
closed
5 years ago
0
intermittent issues with connections and file names
#12
David-Levinthal
closed
5 years ago
3
Add "lxml" package to requirements.txt
#11
lifefeel
closed
5 years ago
1
Use BlingFire instead of NLTK as tokenizers
#10
soskek
closed
5 years ago
0
Fixed SSL CERTIFICATE_VERIFY_FAILED error in Python3.6 on Mac OS X
#9
lifefeel
closed
5 years ago
2
HTTPError: HTTP Error 401: Authorization Required
#8
NotToday
closed
5 years ago
2
Fix merging sentences in one paragraph
#7
yoquankara
closed
5 years ago
4
Add short sleep after successful download
#6
yoquankara
closed
5 years ago
5
Adds requirements.txt
#5
afrozas
closed
5 years ago
1
Update
#4
soskek
closed
5 years ago
0
Network Error
#3
SummmerSnow
closed
5 years ago
3
add language filter
#2
soskek
opened
6 years ago
0
Add html2text as requirement
#1
butsugiri
closed
6 years ago
1