Open ronaldtse opened 4 years ago
https://github.com/interscript/geotest/issues/1
For this database, GeoTest outputs the following result:
# bundle exec ruby test.rb files/kn/kn.txt
.....
0 records have a non-unique UNI (should be 0)
Out of 34883 related clusters we get 17425 unique related clusters
Unique clusters have 34883 members in total (this should match a number of related clusters)
Hash of cluster length to a number of clusters of that kind: {2=>17406, 3=>7, 4=>11, 6=>1}
Transliteration systems used:
- "" * 74776 (17773 with a pair)
- "kor_Hang2Latn_MR_1939" * 18217 (16229 with a pair) implemented in Interscript as bgn-kor-Hang-Latn-1943
- "kor_Hang2Latn_MOCT_2000" * 7907 (84 with a pair) implemented in Interscript as moct-kor-Hang-Latn-2000
- "zho_Hani2Latn_GCH_1979" * 4 (3 with a pair) implemented in Interscript as sac-zho-Hans-Latn-1979
- "zho_Hani2Latn_WDG_1979" * 2 (0 with a pair) implemented in Interscript as var-zho-Hani-Latn-wd-1979
- "rus_Cyrl2Latn_BGN_1947" * 1 (1 with a pair) implemented in Interscript as bgnpcgn-rus-Cyrl-Latn-1947
Among the unique clusters:
- 0 clusters are too short
- 1 clusters contain no non-ASCII entries
- 1128 clusters contain no transliteration info
- 15 clusters contain more than 1 non-ASCII entries
- 0 clusters are transliterated with a map not present in Interscript
Remaining 16281 clusters seem to be usable
kor_Hang2Latn_MR_1939: 3859/16187 (23.84%) (Errors: Incorrect punctuation * 11269, Incorrect transliteration * 967, Incorrect casing and (spacing or punctuation) * 67, Incorrect casing and punctuation * 5, Incorrect spacing or punctuation * 20)
kor_Hang2Latn_MOCT_2000: 31/84 (36.9%) (Errors: Incorrect transliteration * 23, Incorrect punctuation * 29, Incorrect spacing or punctuation * 1)
zho_Hani2Latn_GCH_1979: 0/3 (0.0%) (Errors: Incorrect casing and (spacing or punctuation) * 3)
rus_Cyrl2Latn_BGN_1947: 1/1 (100.0%)
: 0/10 (0.0%) (Errors: No support in Interscript * 10)
kn.zip
Only uses these systems:
(Same as #48)