Open michal-h21 opened 8 years ago
The HarfBuzz cluster value of the missing glyph should give you the index of the corresponding character in the input text that was fed to HarfBuzz. When doing fallback it is better to use the same font for the whole grapheme cluster (different thing than HarfBuzz clusters), so a finer implementation would need grapheme cluster detection code.
The character index is value of Harfbuzz's cluster? Regarding the graphemes, is it possible that font supports one grapheme component, while it doesn't support other components? I think that for grapheme detection we can use Selene Unicode library used in LuaTeX, it seems to have grapheme support.
The cluster
field of hb_glyph_info_t
is the index of the character in the input string that the glyph comes from (and it depends on the encoding of the text being fed to HarfBuzz, so for UTF-8 you get UTF-8 indices, etc).
It is common to have parts of a grapheme cluster not supported by the font, for example an accent in a decomposed accented character.
No idea about grapheme support in Selene Unicode, never used it.
I've reworked missing glyphs shaping: from c4557e04694f61dd15cd3255baa35eea3a091c98 to 681f788e2309ac0f3519132e9e554807f562f220
I am sure it has bugs and it probably isn't really efficient, but it seems to work for my examples.
When missing glyph is detected in a word, the whole word is split into graphemes using selene unicode library and each is shaped separately. For those unsupported by the font, script detection is called, using Harbuzz's functions. Graphemes with the same scripts are then joined into strings and shaped using fonts declared for the used script.
I've added support for this functionality, is is in
missing-glyphs
branch. A whole word is reshaped when missing glyph is detected, more intelligent solution would be nice, but I don't now how to detect which characters were shaped and which not in the case some complex transformations were used.Anyway see files
examples/missing,tex
andexamples/missing.pdf
for an example.