keymanapp / keyman

Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
https://keyman.com/
Other
383 stars 107 forks source link

feat(common/models): optionally hide 1,2,3 letter words #2409

Open mcdurdin opened 4 years ago

mcdurdin commented 4 years ago

Is your feature request related to a problem? Please describe. One, two and three letter words (perhaps other lengths?) should not be excluded from a wordlist, but the wordlist author should be able to configure the model to not present them as options for completion (although presentation for correction is still helpful).

Suggested by Doug Higby.

jahorton commented 5 months ago

Potential resolutions to #5025 lend credence to this position. Things get a bit funny for dictionary-based word-breaking at times when the extra-short words aren't included:

dict-breaker sample 2

Neither I nor am is currently in the MTNT model, so merging consecutive unmatched chars ends up treating "iam" as a word.

jahorton commented 1 month ago

I'm not completely clear about what, specifically, this issue is requesting.

What is clear:

Parts I'm less clear on:

but the wordlist author should be able to configure the model to not present them as options for completion

(although presentation for correction is still helpful)


Suppose that we only allowed words of length four or greater as predictions ("for completion") for the following cases.

Example case 1: Suppose the text tue has been typed. Would the be considered a valid predictive-text suggestion here? u neighbors h on the QWERTY layout, so it would be a reasonable correction.

Example case 2: ti has been typed, which naturally leads to time and similar suggestions. i is correctable to o, though - do we offer to as a correction if it's sufficiently likely?

mcdurdin commented 1 month ago

I wonder if the primary desire is to avoid listing the really short words on empty context? I am not sure. @dhigby your thoughts?