jahorton commented 4 months ago

Is your feature request related to a problem? Please describe.

Related to #11963.

One minor pain point in our existing predictive-text engine is that if a user types a punctuation mark after an applied suggestion, any whitespace appended by the suggestion remains, rather than being replaced by the punctuation mark.

For example, with an English keyboard and our MTNT model, applying a suggestion for this and then typing . will result in this ., with the whitespace having been applied as part of the this suggestion.

Describe the solution you'd like

As different languages and scripts use different punctuation marks, I believe we should add a new field of some sort to lexical models (likely within the punctuation config object) that enumerate the language's punctuation marks. There's a chance we may want to associate properties with each, rather than assume they would all be handled the same way, though the latter would make a decent starting point toward a solution for the noted issue.

For example, a hyphen (or dash?) shouldn't replace whitespace. It's perfectly happy to leave it there and follow it. Though... perhaps this could be modeled by just... leaving it out of the new field.

Default behavior for any specified punctuation mark:

Any token that exactly matches such a punctuation mark will be ignored by predictive text when making new suggestions.
Any token that exactly matches such a punctuation mark will trigger whitespace replacement if it follows whitespace.
- Basically, #7163 (as noted below)
Additional idea: we could also treat it just like whitespace and start producing new suggestions that preserve the punctuation mark...
- Though if so, we may want an insertBeforeWord entry in the existing punctuation config object to automatically insert whitespace after the punctuation mark but before the new suggestion.

Describe alternatives you've considered

No response

Related issues

11963

Keyman apps

[X] Keyman for Android
[X] Keyman for iPhone and iPad
[ ] Keyman for Linux
[ ] Keyman for macOS
[ ] Keyman for Windows
[ ] Keyman Developer
[X] KeymanWeb
[ ] Other - give details at bottom of form

Keyman version

current (as of 18.0.74-alpha)

mcdurdin commented 4 months ago

7163

jahorton commented 4 months ago

See also #7163, which goes a bit more into the demand for auto-replacing spaces upon receiving a punctuation.

One particularly comment: https://github.com/keymanapp/keyman/issues/7163#issuecomment-1963545675

... I just had an idea for it that shouldn't be terribly hard to implement for reversing the space. We could add a spot on returned suggestions for specifying whitespace deletion.

The issue, though: which keyboard characters are punctuation, again? We'd need a way to say which characters trigger the auto-delete and which don't... which makes more sense on the model, rather than the keyboard. That opens up issues we'd definitely need to design for. I'm thinking a whitelist approach would be better than a blacklist approach, but that's only a small part of the overall picture.

jahorton commented 4 months ago

Proposed default set: ., ,, ?, !, :, ;

Not including ' or " because it's difficult to tell if it should be leading or trailing, especially given the limited context window we pass through to the worker.

Not including - because use as a dash often is preceded by spaces.

We'll do a first-pass attempt that says "if punctuation mark is listed AND the space beforehand was due to an applied suggestion, delete the space."

keymanapp / keyman

feat(web): better predictive-text punctuation handling #12013

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Related issues

11963

Keyman apps

Keyman version

7163