Closed leafpool243 closed 8 months ago
This is a by-product of normalization, a function of the layout engine (e.g. HarfBuzz): where possible, the layout engine substitutes composite glyphs (e.g. ị U+1ECB) for sequences like i + combining dot below, and because this is before the programming in the font is ever executed, the font can't prevent it.
The usual thing is that when the font sees a sequence like i + combining acute accent, it substitutes a dotless form of i for i. But the font does not contain a dotless form of ị, which would be used only in a few edge cases.
Some fonts (e.g. Gentium) contain a workaround for this particular case, while others (e.g. Brill) leave users to puzzle it out for themselves. My sense is that there are a great many cases where normalization will get in the way of setting the exact sequence you want. The Junicode Manual discusses several of these, and it recommends placing the COMBINING GRAPHEME JOINER U+034F between the base character and the first diacritic to prevent normalization. That is the approach that Junicode and Elstob recommend.
Unfortunately, both Junicode and Elstob also have errors and/or omissions that keep U+034F from working properly in this particular case. The fix will be in Junicode version 2.005, probably in about a week, and in the Elstob repository very soon. After that, the sequence i U+034F U+0323 U+0301 will yield what you want, while the sequence i U+0323 U+0301 willl produce a dotted i:
With the fixes in v2.005, this file
\documentclass[14pt]{article}
\usepackage{fontspec}
\setmainfont[BoldFont=*,
BoldFeatures={RawFeature={axis={wght=700}}},
ItalicFont=*-Italic,
BoldItalicFont=*-Italic,
RawFeature={mode=harf}]{Junicode VF}
\begin{document}
\noindent i í ì î ï ĩ\\
i\char"0323\ i\char"034F\char"0323\char"0301\ i\char"034F\char"0323\char"0300\ %
i\char"034F\char"0323\char"0302\ i\char"034F\char"0323\char"0308\ i\char"034F\char"0323\char"0303\\
i\char"0330\ i\char"034F\char"0330\char"0301\ i\char"034F\char"0330\char"0300\ %
i\char"034F\char"0330\char"0302\ i\char"034F\char"0330\char"0308\ i\char"034F\char"0330\char"0303\\
i\char"0328\ i\char"034F\char"0328\char"0301\ i\char"034F\char"0328\char"0300\ %
i\char"034F\char"0328\char"0302\ i\char"034F\char"0328\char"0308\ i\char"034F\char"0328\char"0303\\
\end{document}
willl yield this result:
Elstob will have similar fixes.
Anyone who works with unusual combinations of diacritics needs to be aware of U+034F, which is nothing more than a zero-width combining mark without visible outlines. Its purpose is to break apart sequences containing combining marks so that (for example) normalization will not take place when the layout engine encounters a base+mark sequence that is canonically equivalent to a Unicode composite character.
A well-behaved search function will ignore U+034F.
Thank you! This is quite interesting.
See if version 2.100 (released today) works for you.
All “i's” with a diacritic below have an extraneous tittle when a diacritic is added above.
See related issue over at Elstob: psb1558/Elstob-font#36