Latin: combining diacritics positioning broken on historical letters

moyogo commented 8 years ago

http://folk.uib.no/hnooh/mufi/specs/MUFI-Alphabetic-4-0.pdf has many composed characters that use combining diacritics. The Unicode characters needed are in Arimo and Tinos but they do not have the proper positioning information, diacritics are not positioned correctly. These diacritics should be visually centered on the base letters and the ogonek above should be centered and attached to the base letters.

For example: A᷎ a᷎ Ꜳ́ ꜳ́ Ꜳ̣ ꜳ̣ Ǽ̨ ǽ̨ Ꜵ́ ꜵ́ Ꜷ́ ꜷ́ Ꜷ̣ ꜷ̣ Ꜹ́ ꜹ́ Ꜹ̨ ꜹ̨ Ꜽ̇ ꜽ̇ Ꜽ̣ ꜽ̣ Ꝺ̇ ꝺ̇ E᷎ e᷎ Ꝼ́ ꝼ́ Ꝼ̇ ꝼ̇ Ꝼ̣ ꝼ̣ O᷎ o᷎ Ꝛ́ ꝛ́ Ꝛ̣ ꝛ̣ Ꝡ̋ ꝡ̋ in Arimo: screen shot 2016-05-28 at 12 50 37 in Tinos:

brawer commented 8 years ago

@marekjez86, do we already maintain a list of Unicode sequences that should look reasonable? For example, this bug reports (among many others) a GPOS problem with U+A732 U+0301 Ꜳ́. Of course we need to fix this, but ideally we could also start a systematic collection of sequences, and have regression tests to make sure that they don’t break again.

jungshik commented 8 years ago

@moyogo, could you help us with collecting such a list (mentioned by @brawer )? We really appreciate all the bugs you've filed against LGC fonts.

As @brawer suggested, we need to come up with better/systematic way of keeping them in order. That way, we can make sure that all the bugs filed by you are eventually fixed and do not regress in the future.

/cc @dougfelt @pychen1969

marekjez86 commented 8 years ago

@moyogo : is there a font that handles MUFI? just curious

davelab6 commented 8 years ago

@marekjez86 https://www.google.com/fonts/specimen/Caudex

marekjez86 commented 8 years ago

@davelab6 : thank you

Here are the diffs of sample MUFI text handling between Caudex as the "before" fonts (in pink in the PDF files below) and Arimo, Cousine and Tinos fonts (lines with greenish background in the PDF files below)

mufi-Arimo-Bold.pdf mufi-Arimo-Regular.pdf mufi-Cousine-Bold.pdf mufi-Cousine-Regular.pdf mufi-Tinos-Bold.pdf mufi-Tinos-Regular.pdf

marekjez86 commented 8 years ago

I used the following file as a sample:

mufi.html.zip

jungshik commented 8 years ago

@marekjez86 Can you test this set of characters with Noto Sans and Noto Serif as well? If they have a similar issue, please file a separate bug. You may as well do the same with Roboto and file in google/roboto.

moyogo commented 8 years ago

@marekjez86 A list of MUFI fonts is available at http://folk.uib.no/hnooh/mufi/fonts but like Caudex most don’t have the positioning information for some of those character sequences. I guess they have the PUA characters listed in the MUFI character recommendation list instead. Palemonas seems to contain substitutions to precomposed glyphs for the standard characters.

@jungshik The MUFI character recommendation list has PUA characters for those sequences but some Wikipedia pages or templates have those as standard characters along with more general Latin character combinations:

These lists are probably going to change overtime. A general rule should be that LGC letters should have anchors for all the diacritics that can be used with those scripts. One can be more selective on what that means but given the scope of these fonts that means any combination should be handled as good as reasonably possible.

Noto Sans and Noto Serif don’t have these character in the release versions but have them without anchors in noto source. I’ve already opened the issue for Roboto: https://github.com/google/roboto/issues/183.

marekjez86 commented 8 years ago

Diff between Caudex and PalemMUFI in regular

caudex-palemmufi-reg

Diff between Caudex and PalemMUFI in bold

caudex-palemmufi-bold

brawer commented 8 years ago

@marekjez86, what’s your verdict on this bug—what will happen by when?

squinky86 commented 5 years ago

I was working with some medieval Latin text and got some tofu. It appears that the noto fonts support all but most of the PUA fonts (tofu-filled table at https://en.wikipedia.org/wiki/Medieval_Unicode_Font_Initiative)

marekjez86 commented 5 years ago

@squinky86 : unfortunately anything not assigned a Unicode codepoint is outside of scope for Noto ; this covers all MUFi

brawer commented 5 years ago

@marekjez86, would you (or someone else at Google) be able to go through the MUFI glyph repertoire, add corresponding test cases to your mufi test file which you’ve posted in https://github.com/googlefonts/noto-fonts/issues/704#issuecomment-224127012, and then add the result to the Noto test suite?

I believe there’s a mis-understanding here. MUFI’s goal is processing medieval text with modern software; nobody objects to that. However, MUFI has been trying to allocate codepoints for pre-composed accents because “Smart font technology is needed in order to display and print decomposed characters properly. At the time of writing, this technology is not yet fully mature” (MUFI 4.0, section 2). That argument didn’t exactly fly within Unicode, and as of 2019 it’s certainly not true anymore (mainly thanks to Emoji). Sadly, not every scholar in medieval studies also understands modern font technology—they certainly can’t blamed for that... Ultimately, MUFI went off defining their PUA hacks, not realizing that in reality they’re just facing a simple font problem which is outside the scope of Unicode. That’s why there’s now some frustration on all sides.

My proposal for @marekjez86 would be to find someone who looks at the MUFI glyph repertoire, to encode each and every glyph using current Unicode (this should be possible), to make it a test case for Noto, and finally to nag the upstream vendor to fix the bugs. For example, MUFI wanted an Ꜳ́ glyph without realizing that it can be encoded in Unicode as U+A732 U+0301; it would be good to have a test case for this sequence, thus making sure that Noto renders it well.

Likewise, MUFI has requested various ligatures from Unicode without realizing that these can already be encoded with zero-width joiners sequences. Admittedly, current fonts do not support those ZWJ sequences, but that’s just a font problem, outside the scope of Unicode. Again, it would be good for Noto to have test cases for these ligatures. For example, MUFI’s PUA ligature U+EFAD LATIN SMALL LIGATURE OC can be encoded in proper Unicode as o‍c U+006F U+200D U+0063. So the first step would be to add this Unicode sequence to the Noto test suite, and then to file a Noto bug in case it doesn’t look right.

It’s a lot of work; perhaps you can find someone at MUFI to help.

marekjez86 commented 5 years ago

@brawer : thanks for the clarification

nizarsq commented 4 years ago

Tested A᷎ a᷎ Ꜳ́ ꜳ́ Ꜳ̣ ꜳ̣ Ǽ̨ ǽ̨ Ꜵ́ ꜵ́ Ꜷ́ ꜷ́ Ꜷ̣ ꜷ̣ Ꜹ́ ꜹ́ Ꜹ̨ ꜹ̨ Ꜽ̇ ꜽ̇ Ꜽ̣ ꜽ̣ Ꝺ̇ ꝺ̇ E᷎ e᷎ Ꝼ́ ꝼ́ Ꝼ̇ ꝼ̇ Ꝼ̣ ꝼ̣ O᷎ o᷎ Ꝛ́ ꝛ́ Ꝛ̣ ꝛ̣ Ꝡ̋ ꝡ̋ in Arimo and Tinos. Apparently reported unicode characters removed from Arimo. Issue reported reproducible in Tinos.

notofonts / Arimo

Latin: combining diacritics positioning broken on historical letters #9