Open brawer opened 8 years ago
Feedback from Google’s Burmese linguist: Most of the above are correct Unicode, but သွားလတံ္တနည်း should be သွားလတ္တံနည်း
There’s a neat test case for Myanmar OpenType shapers at the end of section Well-formed Clusters in the spec, just before Reordering Characters:
င်္က္ကျြွှေို့်ာှီ့ၤဲံ့းႍ
@davelab6, @behdad or @mjansche, are you aware of any font that can render it? If so, I’d ask the copyright owner if they’d be willing to allow us (Unicode) to incorporate the glyphs for just this one cluster into Unicode’s test suite for text rendering engines. They’d need to sign Unicode’s Contributor Licensing Agreement; I’ll handle the paperwork.
That's a rather contrived example. I have been using examples from UTN 11 as test cases for a similar purpose. Coincidentally I also prepared a list of frequent clusters that occur in a large corpus, which I've been meaning to push out. Stay tuned for that.
Oh cool. Please don't hesitate to send pull requests; much appreciated.
Now that I'm looking at the description of Well-formed Clusters in the OpenType spec, I notice that it doesn't seem to match the corresponding description in UTN 11. (Working code: https://github.com/googlei18n/language-resources/blob/master/third_party/unicode/utn11.py) According to the regex in UTN 11, that cluster is not recognized as valid and/or in canonical storage order. This could well be a problem in the regex, but I think it points to a deeper mismatch between what fonts/shapers/renderers have to worry about vs. what is needed for representing actual text.
Adding @mhosken who wrote UTN11 for clarification.
FWIW, rendering using padauk in a graphite context (firefox or libreoffice) will give you a pretty strong test of strings conformity to UTN#11. UTN#11 is stricter than the OpenType spec, and that's OK. I don't think it's necessarily the shaper's responsibility to be the encoding police. The only thing that would be bad is if the shaper marked something bad that UTN#11 says is good.
BTW you are welcome to use Padauk for your test string and that is an OFL font that needs no agreement to use in the Unicode Standard book or anywhere else by them.
@mhosken, your text certainly looks like a nice test case; can you post the Unicode string for it? (Sorry to ask, but my Burmese is inexistent).
It's your string from comment 3 above. But the Graphite font hasn't been set up to render 4 medials in sequence like that because no language ever uses them all. Of course there are also other medials used by minorities, not in your string. So I suppose it could be madder. Hence not reordering the U+1031.
Myanmar Text from Microsoft renders the original test correctly.
https://github.com/googlei18n/noto-fonts/issues/769#issuecomment-254315022 has test cases for Myanmar shaping. Before making adding them as test cases, we need to triple-check that these are actually Unicode strings and not in Zawgyi encoding.
အကျွန်ုပ်သည် သွားလတံ္တနည်း သဗ္ဗာသဝသုတ် အကျွန်ုပ်သည် သာဝတ္ထိ ဤသို့ မြတ်စွာဘုရားသည် သွားလတံ္တနည်း