feat(common/models): optionally hide 1,2,3 letter words

mcdurdin commented 4 years ago

Is your feature request related to a problem? Please describe. One, two and three letter words (perhaps other lengths?) should not be excluded from a wordlist, but the wordlist author should be able to configure the model to not present them as options for completion (although presentation for correction is still helpful).

Suggested by Doug Higby.

jahorton commented 5 months ago

Potential resolutions to #5025 lend credence to this position. Things get a bit funny for dictionary-based word-breaking at times when the extra-short words aren't included:

dict-breaker sample 2

Neither I nor am is currently in the MTNT model, so merging consecutive unmatched chars ends up treating "iam" as a word.

jahorton commented 1 month ago

I'm not completely clear about what, specifically, this issue is requesting.

What is clear:

Some of our models have excluded all one letter words, and possibly two or three letter words, so that they don't show up early within predictions.

Parts I'm less clear on:

but the wordlist author should be able to configure the model to not present them as options for completion

So, a setting in the model (perhaps, minCompletionLength) that would cause shorter words to be "skipped over" when making predictions?

(although presentation for correction is still helpful)

... So that setting would not be used to threshold any sufficiently-likely corrections to what has already been typed?

Suppose that we only allowed words of length four or greater as predictions ("for completion") for the following cases.

Example case 1: Suppose the text tue has been typed. Would the be considered a valid predictive-text suggestion here? u neighbors h on the QWERTY layout, so it would be a reasonable correction.

Example case 2: ti has been typed, which naturally leads to time and similar suggestions. i is correctable to o, though - do we offer to as a correction if it's sufficiently likely?

mcdurdin commented 1 month ago

I wonder if the primary desire is to avoid listing the really short words on empty context? I am not sure. @dhigby your thoughts?

keymanapp / keyman

feat(common/models): optionally hide 1,2,3 letter words #2409