Lexical models: respect for originally-typed capitalization

DavidLRowe commented 4 years ago

Is your feature request related to a problem? Please describe. When I'm trying to enter a word beginning with an uppercase letter ("Sunahan") and have typed the first letter ("S" in the screen shot), and the lowercase version ("sunahan") is suggested, clicking on the suggested word enters the word with an initial lowercase letter instead of the initial uppercase letter I typed.

Describe the solution you'd like Since the lowercase word has been proposed, the model has already made the correspondence between the words -- one beginning with an uppercase letter and the other a lowercase letter. It seems reasonable that the model could use the beginning letter(s) that were actually typed.

Describe alternatives you've considered Currently one needs to type the entire word to get the uppercase version.

Product context

Keyman Developer 13, predictive text model testing in web browser

Additional context The next word to be typed is "Sunahan", which is the first suggested word, but it appears with a lowercase "s" at the beginning.

If I click on the "sunahan", I get "sunahan" -- that is, the capital "S" is replaced by lowercase "s".

This is probably no surprise, but it did surprise me and made me wonder why I had to type the whole word just to keep the uppercase "S".

Other considerations: some languages (for example, Ife (Togo) have prefixes that remain lowercase even when attached to a word beginning with an uppercase letter. (For example, ńǸwé-Ɔ̀ɖáyé in Luke 4:4, https://www.bible.com/bible/1759/LUK.4.IFE.)

mhosken commented 4 years ago

:+1:

mcdurdin commented 4 years ago

Brain dump from predictive text planning meeting.

Capitalisation of suggestions
- Start of sentence, suggesting capitalized word
- Capitalisation controls in model
  - Does the language have capitalization?
- Capitalisation rules
- dotted i vs dotless i (Turkic)
- German 'SS'
- Defining the capital pairs
- Respect originally typed capitalization (#2638)
- Punctuation around 'start of sentence'
Double-tap space -> ". "
- Defined in the model?
- Queried by web-core, etc for actual implementation
- Concern: Timing based
- Control of option by user
- Specify the output options -> e.g. ". " vs khmer full stop.
Caps Lock
- Double-tap shift
- Defined in the model? Or is it about the keyboard layers.
- Queried by web-core, etc for actual implementation
- Concern: Timing based
- Control of option by user
Capital first letter at start of text / new sentence
- After first letter, revert to default layer automatically.
Keyboard provides "capitalization layer hints" to model.
- If present, then model is responsible for driving current layer at certain times.
- May need paired mappings
- Example case: sil_cameroon_qwerty’s secondary layer set (multi-colored key)
- Alternative way of specifying initial layer
```
initial > use(mygroup)
```
group(mygroup)

nul > layer(shift) ". " > layer(shift) nomatch > layer(default) any(digit) > layer(numeric) intent(digit) > layer(numeric)
```
* Overlaps with intents
```

jeffheath-sil commented 4 years ago

I realize that there are some special case languages, as mentioned above and in the discussion here: https://community.software.sil.org/t/predictive-model-turns-majuscules-into-minuscules/3443/10?u=jheath.

But it seems to me like you should implement a basic capitalization functionality that works for 99+% of languages (as I described in the discussion thread). It would be fairly easy to implement a simplistic algorithm based on the capitalization information in the UCD. It would be a big win for a lot of languages, and it really wouldn't be any worse for the special case languages, while they wait for a more complex model to be developed to solve their problems. IMHO...

mcdurdin commented 4 years ago

It will probably be a staged implementation, but I want to make sure we don't build ourselves into a corner by bypassing the design phase. This has happened over and over again in internationalization in software and it then takes years and much pain to correct (if it ever is). We are planning to do the design in a few weeks and hopefully the simple cases will fall out relatively easily.

jeffheath-sil commented 4 years ago

Understood. We do appreciate all of your good work, and the care with which you are making these decisions!

mcdurdin commented 4 years ago

And I appreciate your support :smile:!

eddieantonio commented 4 years ago

Quickly jotting down @jahorton's thoughts before we forget:

The LMLayer should be able to send messages to the keyboard the hint that it should switch into the shift layer, e.g., after seeing sending ending punctuation + space.

However, the keyboard layer can provide a simple approach that works for many languages.

My thoughts: why not both? If a model exists for the language, why not let the model tell request the keyboard go into the shift layer? The keyboard layer can also handle this, and, as far as I understand, some keyboards' rules already do this!

mcdurdin commented 4 years ago

3620: Specification for the Caps Lock layer (double-tap Shift). (15.0)
3621: Specification for automatic selection of the Shift layer at start of sentence. (15.0)
3720: Specification for model respecting capitalization. (14.0)

mcdurdin commented 3 years ago

Closing this issue; follow progress against specifications in previous comment.

keymanapp / keyman

Lexical models: respect for originally-typed capitalization #2638

3620: Specification for the Caps Lock layer (double-tap Shift). (15.0)

3621: Specification for automatic selection of the Shift layer at start of sentence. (15.0)

3720: Specification for model respecting capitalization. (14.0)