Closed GoogleCodeExporter closed 8 years ago
Also for this:
ဝဍ္ဎန
(unicode encoding, type "<!e")
Original comment by seth.h...@gmail.com
on 16 Nov 2010 at 8:51
First of all, we left some rules "for a future release". These include ~!:
U+100B 100B_stack --> 1097_zg
U+100D 100D_stack --> 106E_zg
U+100D U+1039 U+100E --> 106F_zg
1010_stack 103D --> 1010_103D_stack
U+1039 U+1010 103D --> 1010_103D_stack
Confirming each rule manually:
႗
ၮ
ၯ
႖
(the last 2 rules are essentially the same)
Done! Now on to the mundane bug fix....
Original comment by seth.h...@gmail.com
on 23 Nov 2010 at 1:27
Ah, found it.
Kinzi is being treated as in BOTH syllables, before and after it occurs. Why
does it straddle this boundary? Probably has something to do with the fact that
it's the only letter in Unicode that comes before the consonant.
Time to fix.... whee!
Original comment by seth.h...@gmail.com
on 23 Nov 2010 at 2:07
Fixed. Segmentation is much better.
We'll need extensive testing to make sure that words aren't broken in general
though. Yet another reason to switch to Unicode internally....
Original comment by seth.h...@gmail.com
on 23 Nov 2010 at 3:00
Original issue reported on code.google.com by
seth.h...@gmail.com
on 15 Nov 2010 at 6:19