Zawgyi keyboard ya-yit error

GoogleCodeExporter commented 9 years ago

Reported by Lionslayer:
I found a minor mistake.
Zawgyi-one keyboard and Ra-yit got a sequence problem whether u type ra-yit in 
font or behind.
ဖစြ်ပသညေ်။ 
ဖစြ်ပသညေ်။ဖစြ်ပသညေ်။အပစြ် 
အဖစြ်
\u1016\u1005\u103C\u103A\u1015\u101E\u100A\u1031\u103A\u104B 
\u1016\u1005\u103C\u103A\u1015\u101E\u100A\u1031\u103A\u104B\u1016\u1005\u103C\u
103A\u1015\u101E\u100A\u1031\u103A\u104B\u1021\u1015\u1005\u103C\u103A 
\u1021\u1016\u1005\u103C\u103A
-----

I will have to debug this; hope it's not another bug in KeyMagic4WaitZar...

Original issue reported on code.google.com by seth.h...@gmail.com on 28 Sep 2010 at 10:01

Blocking: #89

GoogleCodeExporter commented 9 years ago

Hmm... it's a bit of a problem.

I could always use the Shan fix for this; it works just dandy for re-ordering.

First, though, I want to check if the Zawgyi keyboard has any built-in code for 
fixing this.

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 10:52

Changed state: InProgress

GoogleCodeExporter commented 9 years ago

Hmm.. the problem runs much deeper than just the Rules file. 
Trace:
User typed:  u
   $row2K[*] => $row2U[$1]
      ==>u1000
User typed:  u1000j
   < VK_KEY_J > => $ZWS + $yayit
      ==>u1000u200bu103c

So, U+1000 U+200B U+103C is correct. However, then the following happens:
  1) Filter (U+200B is removed)
  2) Convert to Zawgyi (U+103C is re-ordered, since it seems like valid Unicode).
  3) Display, and confuse the user.

What's worse is that, although this should be fixed, outputting in Unicode 
makes it even worse:
  1) Filter (U+200B is removed)
  2) Output (U+103C is placed visually before U+1000, since it's valid Unicode).

I think the second problem can't be fixed (for now), due to the weird display 
glitches that U+200B causes. 

However, if we can fix the first problem, then people will expect weird output 
(since they'll see it on-screen).

Is our converter stripping U+200B? That might be the problem.

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 11:38

GoogleCodeExporter commented 9 years ago

Our converter isn't stripping U+200B. 

At the line:
src = waitzar::renderAsZawgyi(src);

...then src goes from:
  U+1000 U+200B U+1031
...to:
  U+1031 U+1000 U+200B U+200B

The extra U+200B is not the problem; including it, the output should look like 
this:
  U+1000 U+200B(ORIG) U+1031 U+200B(EXTRA)

For some reason, though, U+1031 is being moved despite having U+200B before it.

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 11:56

GoogleCodeExporter commented 9 years ago

Conversion output from "ua", now logged!

Unicode: {\u1000\u200B\u1031}
   norm: {\u1000\u200B\u1031\u0}
   dash: {\u1000\u200B\uE000\u1031\u0}
   stck: {\u1000\u200B\uE000\u1031\u0}
   Begin Match
      Rule{ORDER, at[\u1031], match[000000000000000000000000000000000000000111], replace[\u0]}
         (3,0)
      Rule{ORDER, at[\u1031], match[000000000000000000000000000000000000000111], replace[\u0]}
   End Match
   mtch: {\u1031\u1000\u200B\uE000}
   subs: {\u1031\u1000\u200B\u0}
   Begin Re-Ordering
   End Re-Ordering
Zawgyi1: {\u1031\u1000\u200B}

To put it simply, those ORDER rules shouldn't be there. The second one isn't 
applied, so that's fine. The first one puts \u1031 at position 0 (before 
\u1000). That's the problem.

Why isn't U+200B causing a "sequence" break?

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 6:32

GoogleCodeExporter commented 9 years ago

Ok, got it! Our "prevConsonant" variable tracks the last known consonant. It 
assumes strings will contain only Myanmar text, and will start on a consonant. 
We need a way to "reset" this behavior.

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 6:35

GoogleCodeExporter commented 9 years ago

Fixed; just treat U+200B as a "consonant" character. This should be considered 
a temporary fix... that entire converter is held together by thread.

Will release a nightly and get the original bug reporter to confirm this is 
fixed before I close this bug.

Original comment by seth.h...@gmail.com on 10 Nov 2010 at 7:27

GoogleCodeExporter commented 9 years ago

Update: a lot of this is fixed, but kinzi is not re-orderd properly. So, *F 
yields ဂင်္, which puts kinzi after. 

This is obviously a problem if you type *F* (ဂင်္ဂ) --you can see 
that kinzi will stack after the second "ga".

Original comment by seth.h...@gmail.com on 13 Nov 2010 at 7:35

GoogleCodeExporter commented 9 years ago

Added a great deal of reordering code; most of these issues have been fixed. 

Need to test exact Unicode normalization issues for other letters.

Once normalized Unicode words, we can consider a release.

Original comment by seth.h...@gmail.com on 13 Nov 2010 at 9:08

GoogleCodeExporter commented 9 years ago

Complex examples work. 
I think we should have a "normalize" feature in KeyMagic. :P

Now, to make a nightly release...

Original comment by seth.h...@gmail.com on 13 Nov 2010 at 12:39

GoogleCodeExporter commented 9 years ago

From Lionslayer:
1) "m key" "B key" "N key" for ra-yit is not still working in Zg kb with all 
encodings.
2) After space (for applying texts), we still have to hit another space to get 
the space. If space can create an extra space from the start, it will save time 
and our typing pattern.

The first one is a real issue (I'm not sure why 'B' and 'N' fail to reorder 
ya-yit.)
'M' must be capital (and there's still a glitch).

For the second one, it's more of a usability thing, not a bug.

Original comment by seth.h...@gmail.com on 16 Nov 2010 at 8:48

GoogleCodeExporter commented 9 years ago

Fixed. We require $ZWS before any medial to handle reordering properly.
Spun off 2 into its own bug.
Closing; I'll open new bugs for remaining Zawgyi errors as they pop up.

Original comment by seth.h...@gmail.com on 16 Nov 2010 at 9:28

Changed state: Fixed

sorlok / waitzar

Zawgyi keyboard ya-yit error #145