psb1558 / Junicode-font

A new version of Junicode font
SIL Open Font License 1.1
376 stars 18 forks source link

Weirdness with combining above diacritics when there are diacritics below #238

Closed leafpool243 closed 8 months ago

leafpool243 commented 9 months ago

圖片

All “i's” with a diacritic below have an extraneous tittle when a diacritic is added above.

\documentclass[14pt]{article}
\usepackage{fontspec}
\setmainfont[BoldFont=*,
BoldFeatures={RawFeature={axis={wght=700}}},
ItalicFont=*-Italic,
BoldItalicFont=*-Italic,]{Junicode VF}
\begin{document}
\noindent i í ì î ï ĩ\\
ị ị́ ị̀ ị̂ ị̈ ị̃\\
ḭ ḭ́ ḭ̀ ḭ̂ ḭ̈ ḭ̃\\
į į́ į̀ į̂ į̈ į̃\\
\end{document}

See related issue over at Elstob: psb1558/Elstob-font#36

psb1558 commented 9 months ago

This is a by-product of normalization, a function of the layout engine (e.g. HarfBuzz): where possible, the layout engine substitutes composite glyphs (e.g. U+1ECB) for sequences like i + combining dot below, and because this is before the programming in the font is ever executed, the font can't prevent it.

The usual thing is that when the font sees a sequence like i + combining acute accent, it substitutes a dotless form of i for i. But the font does not contain a dotless form of , which would be used only in a few edge cases.

Some fonts (e.g. Gentium) contain a workaround for this particular case, while others (e.g. Brill) leave users to puzzle it out for themselves. My sense is that there are a great many cases where normalization will get in the way of setting the exact sequence you want. The Junicode Manual discusses several of these, and it recommends placing the COMBINING GRAPHEME JOINER U+034F between the base character and the first diacritic to prevent normalization. That is the approach that Junicode and Elstob recommend.

Unfortunately, both Junicode and Elstob also have errors and/or omissions that keep U+034F from working properly in this particular case. The fix will be in Junicode version 2.005, probably in about a week, and in the Elstob repository very soon. After that, the sequence i U+034F U+0323 U+0301 will yield what you want, while the sequence i U+0323 U+0301 willl produce a dotted i:

image
psb1558 commented 9 months ago

With the fixes in v2.005, this file

\documentclass[14pt]{article}
\usepackage{fontspec}
\setmainfont[BoldFont=*,
BoldFeatures={RawFeature={axis={wght=700}}},
ItalicFont=*-Italic,
BoldItalicFont=*-Italic,
RawFeature={mode=harf}]{Junicode VF}
\begin{document}
\noindent i í ì î ï ĩ\\
i\char"0323\ i\char"034F\char"0323\char"0301\ i\char"034F\char"0323\char"0300\ %
     i\char"034F\char"0323\char"0302\ i\char"034F\char"0323\char"0308\ i\char"034F\char"0323\char"0303\\
i\char"0330\ i\char"034F\char"0330\char"0301\ i\char"034F\char"0330\char"0300\ %
     i\char"034F\char"0330\char"0302\ i\char"034F\char"0330\char"0308\ i\char"034F\char"0330\char"0303\\
i\char"0328\ i\char"034F\char"0328\char"0301\ i\char"034F\char"0328\char"0300\ %
     i\char"034F\char"0328\char"0302\ i\char"034F\char"0328\char"0308\ i\char"034F\char"0328\char"0303\\
\end{document}

willl yield this result:

image

Elstob will have similar fixes.

Anyone who works with unusual combinations of diacritics needs to be aware of U+034F, which is nothing more than a zero-width combining mark without visible outlines. Its purpose is to break apart sequences containing combining marks so that (for example) normalization will not take place when the layout engine encounters a base+mark sequence that is canonically equivalent to a Unicode composite character.

A well-behaved search function will ignore U+034F.

leafpool243 commented 9 months ago

Thank you! This is quite interesting.

psb1558 commented 8 months ago

See if version 2.100 (released today) works for you.