Shaping words separately breaks contextual ligatures

luapower / tr0

Unicode text rendering engine in Lua

http://luapower.com/tr0

0 stars 1 forks source link

Shaping words separately breaks contextual ligatures #15

Open capr opened 5 years ago

capr commented 5 years ago

see: https://bugzilla.mozilla.org/show_bug.cgi?id=761442
there's talk about a harfbuzz "context" option but I can't see it.
this also disables some advanced features such as word kerning and collision avoidance.

khaledhosny commented 5 years ago

The context part is only useful to get basic Arabic shaping working when shaping part of the word. This provides just bare minimum requirements and should be relied upon when absolutely necessary (e.g. shaping across font boundaries).

You get that by passing the whole paragraph text to hb_buffer_add_* and specifying the offset and length of the part you want to shape. That “part” should be the largest run of text that share the same font, direction, script and language.

capr commented 5 years ago

Thanks for this info. The problem is that I'm currently caching shaped words, and I wouldn't want to renounce that. Is there a fast and reliable way to know when the context will be used and when not so that I can prevent breaking the words in those (presumably rare) cases? Is this a good place to use harfbuzz's unsafe-to-break-here API?

khaledhosny commented 5 years ago

You are splitting the words on space, right? What Gecko and Blink currently do is to detect if the font has any substitutions involving space glyph as input and disable word caching in this case, but this is bit too coarse. I think HB_GLYPH_FLAG_UNSAFE_TO_BREAK flag should be able to tell you if you can split at a given point or not, but not sure how you will be able to use if here since you will need to shape first to get this flag.

capr commented 5 years ago

I'm splitting words using the UAX#14 as implemented by libunibreak. D'oh, you're right about the unsafe-to-break API needing to shape the text first, so that wouldn't work here.