michal-h21 / luatex-harfbuzz-shaper

Experimental text shaping in LuaTeX using Harfbuzz library
10 stars 0 forks source link

Add support for rendering missing glyphs #9

Open michal-h21 opened 8 years ago

michal-h21 commented 8 years ago

I've added support for this functionality, is is in missing-glyphs branch. A whole word is reshaped when missing glyph is detected, more intelligent solution would be nice, but I don't now how to detect which characters were shaped and which not in the case some complex transformations were used.

Anyway see files examples/missing,tex and examples/missing.pdf for an example.

khaledhosny commented 8 years ago

The HarfBuzz cluster value of the missing glyph should give you the index of the corresponding character in the input text that was fed to HarfBuzz. When doing fallback it is better to use the same font for the whole grapheme cluster (different thing than HarfBuzz clusters), so a finer implementation would need grapheme cluster detection code.

michal-h21 commented 8 years ago

The character index is value of Harfbuzz's cluster? Regarding the graphemes, is it possible that font supports one grapheme component, while it doesn't support other components? I think that for grapheme detection we can use Selene Unicode library used in LuaTeX, it seems to have grapheme support.

khaledhosny commented 8 years ago

The cluster field of hb_glyph_info_t is the index of the character in the input string that the glyph comes from (and it depends on the encoding of the text being fed to HarfBuzz, so for UTF-8 you get UTF-8 indices, etc).

It is common to have parts of a grapheme cluster not supported by the font, for example an accent in a decomposed accented character.

No idea about grapheme support in Selene Unicode, never used it.

michal-h21 commented 8 years ago

I've reworked missing glyphs shaping: from c4557e04694f61dd15cd3255baa35eea3a091c98 to 681f788e2309ac0f3519132e9e554807f562f220

I am sure it has bugs and it probably isn't really efficient, but it seems to work for my examples.

When missing glyph is detected in a word, the whole word is split into graphemes using selene unicode library and each is shaped separately. For those unsupported by the font, script detection is called, using Harbuzz's functions. Graphemes with the same scripts are then joined into strings and shaped using fonts declared for the used script.