issues
search
google
/
corpuscrawler
Crawler for linguistic corpora
Other
194
stars
55
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Error when crawling Kaqchikel
#42
ftyers
closed
5 years ago
3
Crawl Pali corpora
#41
brawer
opened
6 years ago
0
Update Zawgyi locale to Qaag
#40
sffc
opened
6 years ago
0
[iba] Crawl a larger corpus for the Iban language
#39
brawer
closed
6 years ago
0
US embassy crawler for Polish
#38
jimregan
closed
6 years ago
0
how to
#37
MayuraVerma
closed
6 years ago
1
[ga] 3 new crawlers
#36
jimregan
closed
6 years ago
1
[ga] CHG crawler
#35
jimregan
closed
6 years ago
0
Irish Times
#34
jimregan
closed
6 years ago
0
move crawl_bibleis to util; add for Ukrainian
#33
jimregan
closed
6 years ago
0
[ace] bible crawl
#32
jimregan
closed
6 years ago
3
basic crawler for Aceh
#31
jimregan
closed
6 years ago
0
Rename crawl_taq to crawl_kab
#30
brawer
closed
7 years ago
0
[be-tarask] Add corpus for Belarusian (Taraškievica)
#29
brawer
closed
6 years ago
0
[cy] add basic Welsh crawler
#28
cwd24
closed
7 years ago
1
[mi] Filter out lines with English “the” from the Maori corpus
#27
brawer
closed
7 years ago
0
[mi] Filter out English text
#26
brawer
closed
7 years ago
1
Allow Zawgyi crawling separate from my
#25
sffc
closed
6 years ago
0
Thanlwintimes.com No Longer Available
#24
sffc
closed
6 years ago
0
[mi] (public domain) Bible scraper
#23
jimregan
closed
7 years ago
0
[ga] another sentence start to omit
#22
jimregan
closed
7 years ago
0
[ga] conditions were right, needed to cast to int
#21
jimregan
closed
7 years ago
0
need more ns/no ns handling here
#20
jimregan
closed
7 years ago
0
Python 3 compatibility
#19
sffc
opened
7 years ago
1
[ga] url conditions were backwards
#18
jimregan
closed
7 years ago
0
handle mixed broken/unbroken namespaces
#17
jimregan
closed
7 years ago
0
[gd] scraper for dasg corpus (#12)
#16
jimregan
closed
7 years ago
1
[mi] Maori scraper
#15
jimregan
closed
7 years ago
1
[util] Add filepath to FetchResult
#14
behnam
closed
7 years ago
0
[ga] Irish: fixed RTE news scraper
#13
jimregan
closed
7 years ago
0
[gd] Extend Scottish Gaelic corpus
#12
brawer
closed
7 years ago
3
[WIP] [ga] basic crawler for Irish
#11
jimregan
closed
7 years ago
0
basic crawler for Scots Gaelic (gd)
#10
jimregan
closed
7 years ago
0
[si] Add crawler for Sinhala
#9
keshan
closed
7 years ago
0
harfbuzz-testing-wikipedia
#8
behdad
opened
7 years ago
1
[util] Replace unichr() for narrow Python builds
#7
behnam
closed
7 years ago
0
[ar] Add bbc_news and sputnik_news
#6
behnam
closed
7 years ago
0
[ar] Add Modern Standard Arabic: UDHR and DW
#5
behnam
closed
7 years ago
0
[util/fetch] Add more prints for showing progress
#4
behnam
closed
7 years ago
0
Add (Modern Standard) Arabic language
#3
behnam
opened
7 years ago
9
[util/fetch_sitemap] Add subsitemap_filter option
#2
behnam
closed
7 years ago
3
[shn] Add crawler for the Shan language
#1
brawer
closed
7 years ago
0
Previous