Open moyogo opened 8 years ago
@marekjez86, do we already maintain a list of Unicode sequences that should look reasonable? For example, this bug reports (among many others) a GPOS problem with U+A732 U+0301 Ꜳ́. Of course we need to fix this, but ideally we could also start a systematic collection of sequences, and have regression tests to make sure that they don’t break again.
@moyogo, could you help us with collecting such a list (mentioned by @brawer )? We really appreciate all the bugs you've filed against LGC fonts.
As @brawer suggested, we need to come up with better/systematic way of keeping them in order. That way, we can make sure that all the bugs filed by you are eventually fixed and do not regress in the future.
/cc @dougfelt @pychen1969
@moyogo : is there a font that handles MUFI? just curious
@marekjez86 https://www.google.com/fonts/specimen/Caudex
@davelab6 : thank you
Here are the diffs of sample MUFI text handling between Caudex as the "before" fonts (in pink in the PDF files below) and Arimo, Cousine and Tinos fonts (lines with greenish background in the PDF files below)
mufi-Arimo-Bold.pdf mufi-Arimo-Regular.pdf mufi-Cousine-Bold.pdf mufi-Cousine-Regular.pdf mufi-Tinos-Bold.pdf mufi-Tinos-Regular.pdf
I used the following file as a sample:
@marekjez86 Can you test this set of characters with Noto Sans and Noto Serif as well? If they have a similar issue, please file a separate bug. You may as well do the same with Roboto and file in google/roboto.
@marekjez86 A list of MUFI fonts is available at http://folk.uib.no/hnooh/mufi/fonts but like Caudex most don’t have the positioning information for some of those character sequences. I guess they have the PUA characters listed in the MUFI character recommendation list instead. Palemonas seems to contain substitutions to precomposed glyphs for the standard characters.
@jungshik The MUFI character recommendation list has PUA characters for those sequences but some Wikipedia pages or templates have those as standard characters along with more general Latin character combinations:
These lists are probably going to change overtime. A general rule should be that LGC letters should have anchors for all the diacritics that can be used with those scripts. One can be more selective on what that means but given the scope of these fonts that means any combination should be handled as good as reasonably possible.
Noto Sans and Noto Serif don’t have these character in the release versions but have them without anchors in noto source. I’ve already opened the issue for Roboto: https://github.com/google/roboto/issues/183.
Diff between Caudex and PalemMUFI in regular
Diff between Caudex and PalemMUFI in bold
@marekjez86, what’s your verdict on this bug—what will happen by when?
I was working with some medieval Latin text and got some tofu. It appears that the noto fonts support all but most of the PUA fonts (tofu-filled table at https://en.wikipedia.org/wiki/Medieval_Unicode_Font_Initiative)
@squinky86 : unfortunately anything not assigned a Unicode codepoint is outside of scope for Noto ; this covers all MUFi
@marekjez86, would you (or someone else at Google) be able to go through the MUFI glyph repertoire, add corresponding test cases to your mufi test file which you’ve posted in https://github.com/googlefonts/noto-fonts/issues/704#issuecomment-224127012, and then add the result to the Noto test suite?
I believe there’s a mis-understanding here. MUFI’s goal is processing medieval text with modern software; nobody objects to that. However, MUFI has been trying to allocate codepoints for pre-composed accents because “Smart font technology is needed in order to display and print decomposed characters properly. At the time of writing, this technology is not yet fully mature” (MUFI 4.0, section 2). That argument didn’t exactly fly within Unicode, and as of 2019 it’s certainly not true anymore (mainly thanks to Emoji). Sadly, not every scholar in medieval studies also understands modern font technology—they certainly can’t blamed for that... Ultimately, MUFI went off defining their PUA hacks, not realizing that in reality they’re just facing a simple font problem which is outside the scope of Unicode. That’s why there’s now some frustration on all sides.
My proposal for @marekjez86 would be to find someone who looks at the MUFI glyph repertoire, to encode each and every glyph using current Unicode (this should be possible), to make it a test case for Noto, and finally to nag the upstream vendor to fix the bugs. For example, MUFI wanted an Ꜳ́ glyph without realizing that it can be encoded in Unicode as U+A732 U+0301
; it would be good to have a test case for this sequence, thus making sure that Noto renders it well.
Likewise, MUFI has requested various ligatures from Unicode without realizing that these can already be encoded with zero-width joiners sequences. Admittedly, current fonts do not support those ZWJ sequences, but that’s just a font problem, outside the scope of Unicode. Again, it would be good for Noto to have test cases for these ligatures. For example, MUFI’s PUA ligature U+EFAD LATIN SMALL LIGATURE OC
can be encoded in proper Unicode as oc U+006F U+200D U+0063
. So the first step would be to add this Unicode sequence to the Noto test suite, and then to file a Noto bug in case it doesn’t look right.
It’s a lot of work; perhaps you can find someone at MUFI to help.
@brawer : thanks for the clarification
Tested A᷎ a᷎ Ꜳ́ ꜳ́ Ꜳ̣ ꜳ̣ Ǽ̨ ǽ̨ Ꜵ́ ꜵ́ Ꜷ́ ꜷ́ Ꜷ̣ ꜷ̣ Ꜹ́ ꜹ́ Ꜹ̨ ꜹ̨ Ꜽ̇ ꜽ̇ Ꜽ̣ ꜽ̣ Ꝺ̇ ꝺ̇ E᷎ e᷎ Ꝼ́ ꝼ́ Ꝼ̇ ꝼ̇ Ꝼ̣ ꝼ̣ O᷎ o᷎ Ꝛ́ ꝛ́ Ꝛ̣ ꝛ̣ Ꝡ̋ ꝡ̋ in Arimo and Tinos. Apparently reported unicode characters removed from Arimo. Issue reported reproducible in Tinos.
http://folk.uib.no/hnooh/mufi/specs/MUFI-Alphabetic-4-0.pdf has many composed characters that use combining diacritics. The Unicode characters needed are in Arimo and Tinos but they do not have the proper positioning information, diacritics are not positioned correctly. These diacritics should be visually centered on the base letters and the ogonek above should be centered and attached to the base letters.
For example: A᷎ a᷎ Ꜳ́ ꜳ́ Ꜳ̣ ꜳ̣ Ǽ̨ ǽ̨ Ꜵ́ ꜵ́ Ꜷ́ ꜷ́ Ꜷ̣ ꜷ̣ Ꜹ́ ꜹ́ Ꜹ̨ ꜹ̨ Ꜽ̇ ꜽ̇ Ꜽ̣ ꜽ̣ Ꝺ̇ ꝺ̇ E᷎ e᷎ Ꝼ́ ꝼ́ Ꝼ̇ ꝼ̇ Ꝼ̣ ꝼ̣ O᷎ o᷎ Ꝛ́ ꝛ́ Ꝛ̣ ꝛ̣ Ꝡ̋ ꝡ̋ in Arimo: in Tinos: