keymanapp / lexical-models

Lexical language models for predictive text
MIT License
13 stars 38 forks source link

[sil.xmf-latn.mingrelian]: a new lexical model of mingrelian #254

Closed Meng-Heng closed 4 months ago

Meng-Heng commented 6 months ago

Please approve if everything looks good. Thank you!

keyman-server commented 6 months ago

This pull request is from an external repo and will not automatically be built. The build must still be passed before it can be merged. Ask one of the team members to make a manual build of this PR.

darcywong00 commented 5 months ago

This lexical-model is for Latin (Latn) characters. You may need to consult with the community about the following non-Latin characters. I think they're Cyrllic (Cyrl) and Georgian (Geor) characters, and should be removed this wordlist.

Count Unicode Value Character
6 0x000430 а
1 0x000431 б
2 0x000432 в
1 0x000434 д
2 0x000435 е
1 0x000437 з
4 0x000438 и
1 0x000439 й
4 0x00043A к
3 0x00043C м
2 0x00043D н
3 0x00043E о
5 0x000440 р
1 0x000441 с
3 0x000442 т
2 0x000443 у
1 0x000444 ф
1 0x000447 ч
1 0x000448 ш
2 0x00044B ы
169 0x0010F1
1 0x0010F9
10 0x0010FA
DavidLRowe commented 5 months ago

@darcywong00 is correct 0400-04FF = Cyrillic block 10A0-10FF = Georgian block Entries with these characters should be corrected (or dropped from the .tsv file)

Meng-Heng commented 4 months ago

I have removed the specified characters and parentheses in the wordlist. Thanks, @darcywong00 and @DavidLRowe!