unicode-org / text-rendering-tests

Unicode’s test suite for text rendering engines
Other
167 stars 37 forks source link

Test Myanmar shaping #6

Open brawer opened 7 years ago

brawer commented 7 years ago

https://github.com/googlei18n/noto-fonts/issues/769#issuecomment-254315022 has test cases for Myanmar shaping. Before making adding them as test cases, we need to triple-check that these are actually Unicode strings and not in Zawgyi encoding.

အကျွန်ုပ်သည် သွားလတံ္တနည်း သဗ္ဗာသဝသုတ် အကျွန်ုပ်သည် သာဝတ္ထိ ဤသို့ မြတ်စွာဘုရားသည် သွားလတံ္တနည်း

brawer commented 7 years ago

Feedback from Google’s Burmese linguist: Most of the above are correct Unicode, but သွားလတံ္တနည်း should be သွားလတ္တံနည်း

brawer commented 7 years ago

There’s a neat test case for Myanmar OpenType shapers at the end of section Well-formed Clusters in the spec, just before Reordering Characters:

င်္က္ကျြွှေို့်ာှီ့ၤဲံ့းႍ

image

@davelab6, @behdad or @mjansche, are you aware of any font that can render it? If so, I’d ask the copyright owner if they’d be willing to allow us (Unicode) to incorporate the glyphs for just this one cluster into Unicode’s test suite for text rendering engines. They’d need to sign Unicode’s Contributor Licensing Agreement; I’ll handle the paperwork.

mjansche commented 7 years ago

That's a rather contrived example. I have been using examples from UTN 11 as test cases for a similar purpose. Coincidentally I also prepared a list of frequent clusters that occur in a large corpus, which I've been meaning to push out. Stay tuned for that.

brawer commented 7 years ago

Oh cool. Please don't hesitate to send pull requests; much appreciated.

mjansche commented 7 years ago

Now that I'm looking at the description of Well-formed Clusters in the OpenType spec, I notice that it doesn't seem to match the corresponding description in UTN 11. (Working code: https://github.com/googlei18n/language-resources/blob/master/third_party/unicode/utn11.py) According to the regex in UTN 11, that cluster is not recognized as valid and/or in canonical storage order. This could well be a problem in the regex, but I think it points to a deeper mismatch between what fonts/shapers/renderers have to worry about vs. what is needed for representing actual text.

brawer commented 7 years ago

Adding @mhosken who wrote UTN11 for clarification.

mhosken commented 7 years ago

FWIW, rendering using padauk in a graphite context (firefox or libreoffice) will give you a pretty strong test of strings conformity to UTN#11. UTN#11 is stricter than the OpenType spec, and that's OK. I don't think it's necessarily the shaper's responsibility to be the encoding police. The only thing that would be bad is if the shaper marked something bad that UTN#11 says is good.

BTW you are welcome to use Padauk for your test string and that is an OFL font that needs no agreement to use in the Unicode Standard book or anywhere else by them.

brawer commented 7 years ago

@mhosken, your text certainly looks like a nice test case; can you post the Unicode string for it? (Sorry to ask, but my Burmese is inexistent).

image

mhosken commented 7 years ago

It's your string from comment 3 above. But the Graphite font hasn't been set up to render 4 medials in sequence like that because no language ever uses them all. Of course there are also other medials used by minorities, not in your string. So I suppose it could be madder. Hence not reordering the U+1031.

behdad commented 7 years ago

Myanmar Text from Microsoft renders the original test correctly.