issues
search
thammegowda
/
mtdata
A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Better error messages for better UX
#162
SamuelLarkin
opened
3 months ago
0
v0.4.2 (wmt24)
#161
thammegowda
closed
6 months ago
0
Bump requests from 2.31.0 to 2.32.0
#160
dependabot[bot]
opened
6 months ago
0
Depend on external lib for language standardization
#159
AlexUmnov
opened
6 months ago
0
typo: possibly met to say SGM and not TMX
#158
SamuelLarkin
closed
7 months ago
0
Allow strict langpair ordering
#157
erip
opened
7 months ago
1
test
#156
mmpython111
closed
7 months ago
0
branch_test
#155
sifan0067
closed
8 months ago
0
Adding missing kab_DZ
#154
BoFFire
opened
9 months ago
0
Update Tatoeba corpus
#153
jeanm
opened
1 year ago
0
Add TALPCo
#152
kpu
opened
1 year ago
0
Add Thai-English parallel corpus "scb-mt-en-th-2020"
#151
kpu
opened
1 year ago
0
Bump requests from 2.26.0 to 2.31.0
#150
dependabot[bot]
closed
7 months ago
0
v0.4.1
#149
thammegowda
closed
7 months ago
0
How to add in missing parts of datasets
#148
arvieFrydenlund
closed
1 year ago
4
Index store bibkey and not the bibtext content
#147
thammegowda
closed
1 year ago
2
v0.4.0
#146
thammegowda
closed
1 year ago
0
Add Flores 200
#145
ZenBel
closed
1 year ago
1
Add NTREX-128
#144
thammegowda
closed
1 year ago
1
Dataset Add: JParaCrawl Chinese-Japanese
#143
BrightXiaoHan
closed
1 year ago
1
Add Samanantar datasets.
#142
BrightXiaoHan
closed
1 year ago
3
Faster downloads with multiple streams
#141
thammegowda
opened
1 year ago
0
Add support for monolingual data
#140
thammegowda
closed
1 year ago
1
Add `echo` task
#139
thammegowda
closed
1 year ago
1
Opus update + elrc datasets
#138
AlexUmnov
closed
1 year ago
1
No such file or directory: '..../mtdata/index/allenai_nllb.json'
#137
thammegowda
closed
1 year ago
2
Travis build is broken
#136
thammegowda
closed
1 year ago
1
Update opus index and add new datasets to ELRC
#135
ZenBel
closed
2 years ago
1
AllenAi nllb dataset (excluding ccmatrix)
#134
AlexUmnov
closed
2 years ago
1
Add `allenai/nllb` dataset
#133
ZenBel
closed
2 years ago
2
Not all available `ELRC` datasets are downloaded from OPUS
#132
ZenBel
closed
2 years ago
2
Add ebible corpus
#131
joelthe1
opened
2 years ago
0
CVE-2007-4559 Patch
#130
TrellixVulnTeam
closed
2 years ago
0
Is there a way to see the dataset size before starting the download
#129
XapaJIaMnu
opened
2 years ago
5
Add MaCoCu corpora
#128
ZJaume
opened
2 years ago
0
Add mni-eng parallel data
#127
kpu
opened
2 years ago
0
Add gn-es parallel data
#126
kpu
opened
2 years ago
0
0.3.8
#125
thammegowda
closed
2 years ago
0
Update ELRC-SHARE data
#124
thammegowda
closed
2 years ago
0
Update ELRC-SHARE data
#123
kpu
closed
2 years ago
1
Return non-zero on error
#122
kpu
closed
2 years ago
1
Add EU acts in Ukrainian
#121
thammegowda
closed
2 years ago
1
[WIP] 0.3.7 development
#120
thammegowda
closed
2 years ago
0
AI4Bharath link is down
#119
thammegowda
opened
2 years ago
1
Fixed a bug in KECL JParaCrawl v3 extraction used in WMT22 en-ja translation task
#118
thammegowda
closed
2 years ago
0
Fixed a bug in KECL JParaCrawl v3 extraction used in WMT22 en-ja translation task
#117
de9uch1
closed
2 years ago
2
Cannot Download wmt21 en2zh test data
#116
Pzzzzz5142
opened
2 years ago
5
Update cache.py
#115
jgwinnup
closed
2 years ago
2
Trying to use mtdata with python
#114
MathieuGrosso
closed
2 years ago
5
Add ParaCrawl Ukranian bonus
#113
kpu
closed
2 years ago
0
Next