keymanapp / lexical-models

Lexical language models for predictive text
MIT License
13 stars 37 forks source link

[benny_lin.id.kamus_indonesia] add casing #115

Open DavidLRowe opened 3 years ago

DavidLRowe commented 3 years ago

@bennylin release/benny_lin/benny_lin.id.kamus_indonesia

Keyman version 14 has added the possibility for automatic case selection in predictive text models. This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example). Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file. There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties. For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration. Most likely this default operation will be all you need. In that case you don't need any customization. If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bennylin commented 3 years ago

Great news! I will try to update soon.