Open j0ma opened 1 year ago
11450 train 2454 dev 2454 test
train 2387 dev 512 test 512
train 12562 dev 2618 test 2792
6673 train 2050 dev 2048 test
1575 train 893 dev 933 test
Notes on corpora
Corpus counts
Number of lines/sentences
PanLex
11450 train 2454 dev 2454 test
Tatoeba
train 2387 dev 512 test 512
Number of space-separated tokens
PanLex
11450 train 2454 dev 2454 test
Tatoeba
train 12562 dev 2618 test 2792
Number of unique tokens
PanLex
6673 train 2050 dev 2048 test
Tatoeba
1575 train 893 dev 933 test