keymanapp / keyman

Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
https://keyman.com/
Other
403 stars 112 forks source link

feat(web): better predictive-text punctuation handling #12013

Open jahorton opened 4 months ago

jahorton commented 4 months ago

Is your feature request related to a problem? Please describe.

Related to #11963.

One minor pain point in our existing predictive-text engine is that if a user types a punctuation mark after an applied suggestion, any whitespace appended by the suggestion remains, rather than being replaced by the punctuation mark.

For example, with an English keyboard and our MTNT model, applying a suggestion for this and then typing . will result in this ., with the whitespace having been applied as part of the this suggestion.

Describe the solution you'd like

As different languages and scripts use different punctuation marks, I believe we should add a new field of some sort to lexical models (likely within the punctuation config object) that enumerate the language's punctuation marks. There's a chance we may want to associate properties with each, rather than assume they would all be handled the same way, though the latter would make a decent starting point toward a solution for the noted issue.

Default behavior for any specified punctuation mark:

Describe alternatives you've considered

No response

Related issues

11963

Keyman apps

Keyman version

current (as of 18.0.74-alpha)

mcdurdin commented 4 months ago
jahorton commented 4 months ago

See also #7163, which goes a bit more into the demand for auto-replacing spaces upon receiving a punctuation.

One particularly comment: https://github.com/keymanapp/keyman/issues/7163#issuecomment-1963545675

... I just had an idea for it that shouldn't be terribly hard to implement for reversing the space. We could add a spot on returned suggestions for specifying whitespace deletion.

The issue, though: which keyboard characters are punctuation, again? We'd need a way to say which characters trigger the auto-delete and which don't... which makes more sense on the model, rather than the keyboard. That opens up issues we'd definitely need to design for. I'm thinking a whitelist approach would be better than a blacklist approach, but that's only a small part of the overall picture.

jahorton commented 4 months ago

Proposed default set: ., ,, ?, !, :, ;

Not including ' or " because it's difficult to tell if it should be leading or trailing, especially given the limited context window we pass through to the worker.

Not including - because use as a dash often is preceded by spaces.

We'll do a first-pass attempt that says "if punctuation mark is listed AND the space beforehand was due to an applied suggestion, delete the space."