Closed jahorton closed 3 years ago
Android will need a few tweaks to support right-deletions in order to better support mid-token suggestion acceptance - but only in-app. Oddly, it's already supported for the system keyboard... just not the in-app one. Huh.
@darcywong00 Might need your help on this one. I can identify the system-keyboard block that handles this pretty easily:
There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:
I'd prefer not to make things worse there than they already are, hence the call for help here.
There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:
I hope you've got a time machine cause you wrote both right-deletion blocks in #1732 😄
There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:
I hope you've got a time machine cause you wrote both right-deletion blocks in #1732 😄
And I can see why I didn't - take a look at how different the rest of the insertText
code is for in-app vs system! Were it up to me, I'd refactor one of the two to use the other's approach if possible.
Right now, iOS seems covered, even for SMP cases... except for reverting suggestions that were accepted mid-word. (The reversions aren't showing up at the moment; it's related to async [sadly] context-reset ops within the iOS Keyman engine.)
Android in-app also still needs work.
And that fixes the Android in-app issue. Now it's a matter of iOS's context-handling and how it breaks reversions after accepting a mid-context suggestion. Which could be spun off into a separate issue for later.
Admittedly, there's quite a bit of stuff (especially, SMP stuff) that's gotten thrown into this PR, so maybe it'd be wise to stop this one here for that reason as well.
I don't know how some of those SMP issues only showed up when I was testing the Android implementation, but at least I found the issues & fixed 'em. (The ol' classic "How did that ever work?" scenario)
@MakaraSok
For testing conditions with scripts that require SMP handling, I used resources for the Shavian script for English:
english_shavian_igc
keyboard(I tested with Shavian b/c it was used during a lot of prior lexical-model SMP work / PRs.)
Alternatively, BCP-47 code new-newa
also uses an SMP-based script, which has an associated lexical model that should automatically download.
Keyboard name: "Newa Romanized"
[x] Test against at least one SMP keyboard/dictionary combo.
Tests to run for EACH set of keyboard/lexical-model:
Outside of the details above, there should be no unexpected effects.
Things that would be unexpected and worthy of note
Similar tests would be ideal on Keyman for iPhone/iPad, but Makara currently lacks an appropriate test device for that. At the immediate moment, the reversions requested above will not function there, but the rest should be reasonably testable.
Tested on Android 10 (Pixel 2 API 29 - emulator) using the Android's artifact from this PR.
- Tests with English (sil_euro_latin) and the default MTNT model should be used as a baseline.
The suggestions are different when the cursor is placed on different part of the word. For instance, "international" was mistyped as "internatonal".
when the cursor is placed on the first part (between "n" and "t")
when the cursor is placed on the middle part (between "a" and "t")
when the cursor is placed on the last part (between "n" and "a") - no suggestion at all here.
For testing conditions with scripts that require SMP handling, I used resources for the Shavian script for English:
- Use keyboard search, entering "shav" as search text.
- Install the
english_shavian_igc
keyboard- Go to darcywong00.github.io/examples and download the Shavian lexical model; install it.
(I tested with Shavian b/c it was used during a lot of prior lexical-model SMP work / PRs.)
Even though I don't know this language, but I can see the suggestion varies per placement of the cursor (see the screenshot below).
the cursor is placed between the first two characters:
the cursor is placed between the last two characters:
Can I suggest some similar tests on Khmer, where clusters may catch us out?
Tested on Android 10 (Pixel 2 API 29 - emulator) using the Android's artifact from this PR.
- Tests with English (sil_euro_latin) and the default MTNT model should be used as a baseline.
The suggestions are different when the cursor is placed on different part of the word. For instance, "international" was mistyped as "internatonal".
* when the cursor is placed on the first part (between "n" and "t") ![image](https://user-images.githubusercontent.com/28331388/107476198-f2136580-6ba7-11eb-8977-70974d9e0f9b.png) * when the cursor is placed on the middle part (between "a" and "t") ![image](https://user-images.githubusercontent.com/28331388/107476087-c42e2100-6ba7-11eb-83a7-2cfd0c2a3cbc.png) * when the cursor is placed on the last part (between "n" and "a") - no suggestion at all here. ![image](https://user-images.githubusercontent.com/28331388/107476261-0fe0ca80-6ba8-11eb-9fdd-37b6add16089.png)
For testing conditions with scripts that require SMP handling, I used resources for the Shavian script for English:
- Use keyboard search, entering "shav" as search text.
- Install the
english_shavian_igc
keyboard- Go to darcywong00.github.io/examples and download the Shavian lexical model; install it.
(I tested with Shavian b/c it was used during a lot of prior lexical-model SMP work / PRs.)
Even though I don't know this language, but I can see the suggestion varies per placement of the cursor (see the screenshot below).
* the cursor is placed between the first two characters: ![image](https://user-images.githubusercontent.com/28331388/107478156-977c0880-6bab-11eb-88e1-180fbe39fa6d.png) * the cursor is placed between the last two characters: ![image](https://user-images.githubusercontent.com/28331388/107478252-bda1a880-6bab-11eb-9aa0-2102e61674f5.png)
Right... guess I forgot to mention that. That's the sort of behavior I'd expect. Predictive text doesn't currently consider anything on the right when making predictions, but will consider what's on the left.
Can I suggest some similar tests on Khmer, where clusters may catch us out?
If you've got a wordlist and a wordbreaker for Khmer, sure! Otherwise... I mean, you can suggest it, but I just won't do it b/c of the ridiculous time investment required at this stage.
@MakaraSok could put together a wordlist of 20 words or so. We can use spaces between words for the test. Just some words that have clusters e.g. បង្រៀន, etc.
... thinking about it, I might could jerry-rig a local build to turn BKSP into a FWDSP temporarily and see what happens that way, without the need for a toy testing model.
The biggest concern is that the adjustTextPosition(byCharacterOffset:)
function is ambiguous (at best) about how it'll handle caret manipulations with grapheme clusters. If there's a point of failure, it's probably there.
To control the insertion point position, such as to support text deletion in a forward direction, call the
adjustTextPositionByCharacterOffset:
method of theUITextDocumentProxy
protocol. For example, to delete forward by one character, use code similar to [...]
I definitely can't seem to manually position the caret between the graphemes of a cluster via user-interaction, the phrasing seems to imply that this won't be an issue for code. It's only an implication, though, and is definitely worth testing.
accepting suggestions at the end of a word in the middle of the context
- For an English example, enter "the quick brown fox jumped over the lazy dog", then position the caret immediately after the third word, like "brown| ". (Before the space.)
- Multiple suggestions should show if the word is relatively common as a prefix.
- For this example, "brownies" was a suggestion that usually appeared.
reverting such a suggestion
- Immediately after accepting the suggestion, tap the backspace key once. The original text should appear as an option.
- When selected, the original text should be restored.
- The original suggestions (including the accepted one) should then be displayed as suggestions.
accepting suggestions in the middle of a word/token
- For an English example, enter "the quick brown fox jumped over the lazy dog", then position the caret within the third word, like "bro|wn"
- Multiple suggestions should show if the left-hand side is relatively common.
- For this example, "broken" was a suggestion that usually appeared. Accept it.
- For this example, the result should be "the quick broken fox jumped over the lazy dog", with the caret at the end of the word "broken".
reverting suggestions that were accepted mid-word/token (Android only, for now)
- Due to architectural limitations, the caret should end up at the end of the reverted word, not at its original location.
Tested with EuroLatin (SIL) keyboard and everything works as elaborated.
Right... guess I forgot to mention that. That's the sort of behavior I'd expect. Predictive text doesn't currently consider anything on the right when making predictions, but will consider what's on the left.
Intuitively, I was kind of expect the model to look into the whole word as the word "internatonal" does have space delimiter around it.
@MakaraSok could put together a wordlist of 20 words or so. We can use spaces between words for the test. Just some words that have clusters e.g. បង្រៀន, etc.
@mcdurdin @jahorton Should I not play with this toy?
@mcdurdin @jahorton Should I not play with this toy?
I would like you to put it to the test, so please do play with this toy!
Tested using makara.km.test.model-new.zip with Khmer Angkor keyboard. The expected clusters are not shown in the banner. The screenshots below show various position where the cursor is placed and the suggestion. The second screenshot does not give all those words beginning with បង្ខ, but something else.
ML source: makara.km.test.zip
Tested using makara.km.test.model-new.zip with Khmer Angkor keyboard. The expected clusters are not shown in the banner.
[...]
ML source: makara.km.test.zip
Looks like we need a custom wordbreaker even here, even if far simpler than a true Khmer wordbreaker. It's detecting wordbreaks between every character, which is obviously "less than ideal." Thankfully, that .zip includes source I can tweak...
Reasonably-tweaked test resource: makara.km.test.zip
Working reasonably for standard use:
Both are very reasonable suggestions.
All three are obviously reasonable.
Now for the real question: how well does this PR's changeset function here?
Before
I should probably state that getting the caret placed there was a real fight, in and of itself.
I accepted the middle suggestion, resulting in: After
So, apparently, I'd managed to wedge the caret between the ង and ្ប. It's like it didn't do any delete left OR delete right; it just dropped the suggestion in-place. Not sure where the new final word's characters came from, though...
... and a little investigation reveals:
So yeah, the returned suggestion isn't specifying the expected deleteLeft
and deleteRight
in this scenario. Huh.
But deleteLeft
works fine at a word's end - it's just a mid-word issue. In the state below, I can freely flip the first word back and forth between its current form and the two displayed suggestions with no issue - everything works flawlessly and seamlessly there.
It's just the mid-word case that appears to be an issue here.
Of course that's why. I'd need to improve on that makeshift wordbreaker; it's a rough approximation, and it broke for my example text in a way that reported that the suggestion wasn't applied mid-word. (Literally: find first index of the token within the context. But that's not the first spot where ប appears.) So of course it didn't know to delete-right.
For this test, I'd make a wordbreaker that splits only on space. Forget even punctuation.
For this test, I'd make a wordbreaker that splits only on space. Forget even punctuation.
Insufficient. Alas, we assume that the wordbreaker returns spans, not strings.
If start
and/or end
are missing, nothing shows up.
Anyway, got that patched up with yet another, seriously-the-final-version-for-real-this-time version of Makara's mock-up:
makara.km.test.zip
Well, until you want me to make it more robust with the 'start' and 'end' bits. 😟
Proof that it did the trick well enough:
The returned suggestion has proper deleteLeft
and deleteRight
values. (That's the middle suggestion, the one I'm applying.)
And, the grand results?
😆 Well, not quite what we wanted, but at least it makes a lot of sense. The rest of the way will take work within iOS, but it should at least be sufficient for testing on Android. (Make sure to do both in-app and system-level, as the delete-right implementations differ between them!)
but at least it makes a lot of sense
I guess you mean that delete-right isn't working? Can you elucidate?
Changes in this pull request will be available for download in Keyman version 14.0.242-beta
Well... thanks, Apple:
We were definitely right to be concerned about how clusters would be handled. For those who can't read Khmer script, that's a four-character jump. (Three of them visible.)
So, my assumption later in the loop for repositioning the caret is incorrect, causing issues on later loop iterations.
Okay, I've got a fair bit of the core worked out there, though there's something really weird going on now. There seems to be a desync between the text-manipulation method and what actually gets output - of course, only when right-deletions are happening.
So, let's take this as our starting point:
I've confirmed via temporary debug-log statements that this, according to the textDocumentProxy
object used for text manipulation, has an expected final context of ម្រាយ បន្ស៊ី . Exactly what a user would expect. So, of course, what do we get?
Possibility 1
Uh... that's not what textDocumentProxy
told us we'd get. The heck?
Possibility 2
(Note: these screenshots were taken from a clean context, rather than with the English text present at the start.)
Uh... what, mate? Didn't even do anything?
Turns out, it actually did. If you hit BKSP, it'll remove the hidden 'subconsonant' marker. Alternatively, if you reselect the same suggestion again...
And again...
So... the true result incrementally inches closer to the desired suggestion. Wha?
Again, note that in both cases, the actual text-manipulation handling itself computes the correct text immediately, and even the textDocumentProxy
confirms this. Something is interfering with this process. The question is... is there some yet-undiscovered bug in our code that has only appeared now, and only for right-deletions at that... or is it an Apple-side bug?
There's also the fact that the result isn't even 100% predictable, as noted by the two variations seen above!
Since the iOS engine is having trouble with right-deletions and a resolution is proving tricky, I've gone ahead and turned off predictive text's right-deletions for now. Instead, any suggestions accepted mid-token will insert a standard word-break afterward. (Note: this is a perfect match for the behavior of iOS's default predictive text; it doesn't right-delete.)
I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.
I have tried a few other approaches, and one seemed to get remarkably close much of the time... the issue being that it also gave way worse results some of the other times. So... yeah, not changing it over until it's stable.
I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.
Sounds good to me. Have you opened an issue for this yet?
I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.
Sounds good to me. Have you opened an issue for this yet?
It's now up as #4538.
Code related to the new issue (for the deferred right-deletion functionality) has been split off into #4541.
Note that the most recent commit here (which reverted them for this PR) was hand-written, with #4541's first commit a reversion of that.
Changes in this pull request will be available for download in Keyman version 14.0.248-beta
Retest on Android 10 (on both emulator and physical device) based on https://github.com/keymanapp/keyman/pull/4427#issuecomment-776460602:
"accepting suggestions in the middle of a word/token" does not delete the post-half of the word, even though this is the intention, it is not quite helpful because the post-half is not intelligible and has to be manually delete anyway.
I like the ability to switch back and forth to the suggested word automatically when tapped on. For Khmer language, it seems like there is no space after the word after a suggestion is chosen in a new line:
Khmer Angkor - in the first line, a space is seen after a suggestion is chosen; in the second line, it takes three taps on the spacebar to get a regular spacẹ https://www.youtube.com/watch?v=VcaL1R7X-L8
EuroLatin (SIL) - no space after the chosen suggestion, it takes two taps on the spacebar to output a regular space https://youtu.be/lC7ZLZ94ALw
The globe key on the emulator does not respond as expected -- making it impossible to switch between keyboard unless doing it from within Keyman app https://youtu.be/1p3qXUun3aA
For any more specific test, ping me again. :)
Changes in this pull request will be available for download in Keyman version 15.0.19-alpha
While working on #4411, I noticed that our predictive text has, to this point, often assumed that it will always be operating at the end of the current context. This PR seeks to round out that rough edge and provide support for mid-context scenarios:
Support won't be completely perfect, but it's a definite upgrade from how things were before. The main issue: the caret will always be placed at the end of text affected by a reversion, even if it was originally before some of the reverted characters. (Because a suggestion triggered right-deletions.)
Also note that no post-caret text will actually be used by the predictive text engine, same as before.