issues
search
thammegowda
/
mtdata
A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[WIP] v0.3.6
#112
thammegowda
closed
2 years ago
0
ELRC updates: unban some fixed TMXes, add more Irish
#111
kpu
closed
2 years ago
0
[WIP] v0.3.5
#110
thammegowda
closed
2 years ago
0
CCMatrix?
#109
kpu
closed
2 years ago
1
Anuvaad-zee-30042021-eng-ben ERROR:: Unable to add Anuvaad-zee-30042021-eng-ben: en-bn/*.en matched []; expected one file
#108
XapaJIaMnu
opened
2 years ago
1
Parallel Corpora for 6 Indian Languages
#107
kpu
opened
2 years ago
2
Lanfrica
#106
kpu
opened
2 years ago
0
added histogram
#105
sgowdaks
closed
2 years ago
0
Add visualizations in search results
#104
thammegowda
opened
2 years ago
0
WIP 0.3.5
#103
thammegowda
closed
2 years ago
0
Add Fon-French 2 and Daily Dialogues
#102
kpu
closed
2 years ago
0
ELRC-portal_oficial_turismo_españa_www.spain.info-1-eng-por doesn't contain eng-por
#101
XapaJIaMnu
closed
2 years ago
2
v0.3.4 WIP
#100
thammegowda
closed
2 years ago
0
ELRC update
#99
kpu
closed
2 years ago
0
Policy on BCP-47 in TMX files?
#98
kpu
opened
2 years ago
2
ELRC-euipo_law-1-eng-fra hits 403 (forbidden)
#97
XapaJIaMnu
closed
2 years ago
2
Fix #95 by updating TMX names used on ELRC-SHARE for ELRC-swedish_wor…
#96
kpu
closed
2 years ago
1
ELRC-swedish_work_environment-1-eng-fra doesn't work
#95
XapaJIaMnu
closed
2 years ago
1
mtdata get ignores the language pair when the dataset has only one language pair
#94
XapaJIaMnu
closed
2 years ago
1
Fix buggy matching of languages.
#93
XapaJIaMnu
closed
2 years ago
0
Non-fuzzy match mtdata is broken due to comparing `y1==y1`
#92
XapaJIaMnu
closed
2 years ago
0
mtdata list : filters
#91
thammegowda
closed
2 years ago
0
Add wmt21 tests
#90
thammegowda
closed
2 years ago
0
Add wmt21 ccaligned datasets
#89
thammegowda
closed
2 years ago
0
Add ParIce dataset (en-is)
#88
thammegowda
closed
2 years ago
0
Add wmt21 ha-en corpus
#87
thammegowda
closed
2 years ago
0
Add wikititles v3
#86
thammegowda
closed
2 years ago
0
v0.3.3
#85
thammegowda
closed
2 years ago
0
Add datasets listed by Stanford NMT
#84
thammegowda
closed
2 years ago
4
0.3.2 -
#83
thammegowda
closed
2 years ago
0
recipes.yml is not packed in pip package
#82
thammegowda
closed
2 years ago
0
Anuvaad Parallel Corpus for Indian languages
#81
GokulNC
closed
2 years ago
1
Add parallel bible corpus
#80
thammegowda
opened
3 years ago
1
v0.3.1
#79
thammegowda
closed
3 years ago
0
Reading from tarfiles without extracting them is slow
#78
thammegowda
closed
3 years ago
0
JW300 taken down from OPUS
#77
kpu
opened
3 years ago
3
More datasets https://martinweisser.org/corpora_site/corpora2.html
#76
thammegowda
opened
3 years ago
0
Keep datasets compressed
#75
thammegowda
closed
3 years ago
0
BCP47 parsing : (language, script, country)
#74
thammegowda
closed
3 years ago
0
ParaCrawl 9 is out
#73
kpu
closed
3 years ago
0
[WIP] v0.3.0
#72
thammegowda
closed
3 years ago
0
Force utf-8 encoding (be explicit)
#71
thammegowda
closed
3 years ago
0
JW300 v1c
#70
kpu
closed
3 years ago
4
Licence info for datasets
#69
thammegowda
opened
3 years ago
1
Support dataset caching on a server
#68
thammegowda
opened
3 years ago
4
Add CAMeL Arabic resources
#67
kpu
opened
3 years ago
1
Casmacat is down
#66
ZJaume
closed
3 years ago
2
Language code inacurrate for chinese languages
#65
kirianguiller
closed
3 years ago
3
The variable versions for one langauge is not avilable
#64
pluiez
closed
3 years ago
5
The wikititles is incomplete
#63
pluiez
closed
3 years ago
2
Previous
Next