issues
search
google
/
corpuscrawler
Crawler for linguistic corpora
Other
193
stars
55
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Use available sentences corpora for Wikipedia (290+ languages)
#92
hugolpz
opened
9 months ago
0
Fix robots.txt fallback to be a byte string
#91
sffc
closed
11 months ago
0
Fix parsing for rfa.org
#90
sffc
closed
2 years ago
1
Add __main__.py file so that corpuscrawler can be invoked as a module
#89
sffc
closed
2 years ago
0
[ga] update crawler
#88
jimregan
closed
3 years ago
2
Undefined names
#87
cclauss
opened
3 years ago
0
No module named 'corpuscrawler' error
#86
Aayush-hub
opened
3 years ago
2
Update README.md
#85
83-W
closed
3 years ago
1
Use corpora from Universal Dependencies
#84
brawer
opened
3 years ago
0
Documentation > Clarify language codes system in uses
#83
hugolpz
closed
3 years ago
4
Shorten project structure
#82
hugolpz
opened
3 years ago
3
Define crawlers' output format
#81
hugolpz
opened
3 years ago
0
Improve readme documentation on how to provide a new crawler
#80
hugolpz
opened
3 years ago
5
Use available corpora for opensubtitles (63 languages)
#79
hugolpz
opened
3 years ago
3
Add Wikipedia crawler ? (300+ languages)
#78
hugolpz
opened
3 years ago
5
Adding Pali and Karen
#77
sffc
closed
4 years ago
0
Add Pali, Mon, and Karen
#76
sffc
closed
4 years ago
1
Update crawl_su.py
#75
mahalisyarifuddin
closed
4 years ago
1
Adding New URLs
#74
Mounika2405
closed
4 years ago
2
Does not run in python3.7 or python 2.7
#73
ftyers
opened
5 years ago
1
[ga] new crawlers
#72
jimregan
closed
5 years ago
0
[ga] new crawlers
#71
jimregan
closed
5 years ago
0
Set context settable
#70
jimregan
closed
5 years ago
1
Create crawl_sea.py
#69
mahalisyarifuddin
closed
5 years ago
1
Update crawl_id.py
#68
mahalisyarifuddin
closed
5 years ago
0
Create crawl_xmm.py
#67
mahalisyarifuddin
closed
5 years ago
0
Create crawl_bug.py
#66
mahalisyarifuddin
closed
5 years ago
0
Create crawl_tet.py
#65
mahalisyarifuddin
closed
5 years ago
0
Create crawl_nn.py
#64
mahalisyarifuddin
closed
5 years ago
0
Create crawl_nb.py
#63
mahalisyarifuddin
closed
5 years ago
0
Create crawl_eip.py
#62
mahalisyarifuddin
closed
5 years ago
0
Create crawl_saj.py
#61
mahalisyarifuddin
closed
5 years ago
0
Create crawl_xte.py
#60
mahalisyarifuddin
closed
5 years ago
0
Create crawl_bhz.py
#59
mahalisyarifuddin
closed
5 years ago
0
Create crawl_frd.py
#58
mahalisyarifuddin
closed
5 years ago
0
Create crawl_lbw.py
#57
mahalisyarifuddin
closed
5 years ago
0
Update crawl_id.py
#56
mahalisyarifuddin
closed
5 years ago
0
[ga] fix regex
#55
jimregan
closed
5 years ago
1
[th] Add crawl bibleis
#54
wannaphong
closed
5 years ago
1
[th] Thai crawler
#53
wannaphong
closed
5 years ago
1
Fixed Python 3 compatibility
#52
wannaphong
closed
5 years ago
3
Skip urls with non-200 http status
#51
blackblitz
closed
5 years ago
3
404 error with Myanmar Zawgyi
#50
blackblitz
closed
5 years ago
2
Portuguese: doubt about the corpus result
#49
ghost
opened
5 years ago
1
Add Norwegian language
#48
Orekhov
opened
5 years ago
1
Adding title to CONTRIBUTING.md
#47
kshithijiyer
closed
5 years ago
0
Fixed 3 crawlers
#46
cash
closed
5 years ago
2
fixes bibleis crawler
#45
cash
closed
5 years ago
2
crawler gets hung after downloading a few hits
#44
thebucketmouse
closed
5 years ago
2
what sites are crawled?
#43
thebucketmouse
closed
5 years ago
2
Next