olsak / OpTeX

OpTeX - LuaTeX format with extended Plain TeX macros
http://petr.olsak.net/optex/
36 stars 14 forks source link

ligatures interupts colors #153

Closed Udi-Fogiel closed 1 year ago

Udi-Fogiel commented 1 year ago

consider the following document:

\fontfam[lm]

ff{\Blue i}
\bye

as you can see, the i does not get colored. The reason is quite obvious, the color mechanism sees the ligature only at ship-out time, there for it cannot break the ligature (like implementations using whatsits do). I don't know if there is a simple fix for that (maybe the ligature forming process should be colors aware?), but it might be worth documenting this until, or if it will be changed.

Here is an example where the ligature is interrupted:

\fontfam[lm]

ff\hbox{}{\Blue i}
\bye
olsak commented 1 year ago

Thank you for noticing this. We declare it as a feature that ligatures cannot fall apart when there is a color change inside them. I'll try to formulate this in the documentation. The reason is: we want to keep simplicity and a solution of this issue goes against this principle. User have to break the the ligature by own doing, for example by ff\null {\Blue i} or {\Blue ff}\null i.

Udi-Fogiel commented 1 year ago

Rereading my ticket, I wasn't very clear. I agree that colors should not break ligatures. I was wondering if there is any way to color a glyph only partially if it is part of a ligature, maybe it is possible to pass the information about colors to the ligatures forming process.

But, I mainly opened this ticket for the documentation, so I'll close. Thank you.

olsak commented 1 year ago

Your question is: is it possible to color a letter partly? The "letter" can be a ligature or a common letter.

We can do this by PDF primitive for clipping path. The following code sets the red color to the A letter only partly:

 \noindent 
\pdfliteral{q 0 0 4 100 re W n}\rlap{A}\pdfliteral{Q q 1 0 0 rg 4 0 10 100 re W n}\rlap{A}\pdfliteral{Q}\kern7pt next text.
vlasakm commented 1 year ago

Just for future reference:

luacolor package, which does the same coloring in LaTeX explicitly documents the limitation: http://mirrors.ctan.org/macros/latex/contrib/luacolor/luacolor.pdf#subsection.1.3.

luaotfload, which also allows coloring, allows the user to choose at which callback will the coloring be applied (and defaults to post_linebreak_filter): http://mirrors.ctan.org/macros/luatex/generic/luaotfload/luaotfload-latex.pdf#section*.8.

In theory, color is a different style, so it should prevent ligatures, similarly to for example how a switch to bold or italic would. However as mentioned in the linked article, sometimes color is special cased, because naturally we perceive it differently, especially with latin scripts. Here is a more complex example that has no nice solutions: https://faultlore.com/blah/text-hates-you/#style-can-change-mid-ligature.

The technical reason why color isn't taken into account while ligaturing is, that we use LuaTeX attributes, and no code takes our color attribute into account. We maybe could implement colors differently, and force the split of text runs at color changes (e.g. font changes do), but as hinted above, it may not even be what we want.

I don't know what brought you to investigate the issue, but question at Stack Exchange seems interestingly related: https://tex.stackexchange.com/questions/477143/losing-ligatures-when-switching-font-series-or-color-in-the-middle-of-a-word.

Udi-Fogiel commented 11 months ago

We can do this by PDF primitive for clipping path. The following code sets the red color to the A letter only partly:

 \noindent 
\pdfliteral{q 0 0 4 100 re W n}\rlap{A}\pdfliteral{Q q 1 0 0 rg 4 0 10 100 re W n}\rlap{A}\pdfliteral{Q}\kern7pt next text.

Thanks for the suggestion, I always appreciate seeing how you use literarls. I guess that if the ligatures are formed by luaotfloade, I'll have to add some code to the pre_linebreak_filter, but after luaotfload to modify the ligatures (sadly not all hebrew ligatures are unicode characters).

luacolor package, which does the same coloring in LaTeX explicitly documents the limitation: http://mirrors.ctan.org/macros/latex/contrib/luacolor/luacolor.pdf#subsection.1.3.

Yes, this is mostly why I opened the ticket, I was just surprised that this fact wasn't documented in OpTeX as well.

In theory, color is a different style, so it should prevent ligatures, similarly to for example how a switch to bold or italic would. However as mentioned in the linked article, sometimes color is special cased, because naturally we perceive it differently, especially with latin scripts. Here is a more complex example that has no nice solutions: https://faultlore.com/blah/text-hates-you/#style-can-change-mid-ligature.

Very interesting, Thanks!

I don't know what brought you to investigate the issue

see https://tex.stackexchange.com/a/699207/264024 for an example. In hebrew, although punctuation marks never overlap, or connected to the base character, it is often combined into a ligature with the base letter to correct the positioning.

By the way, @vlasakm if you will read the linked post, do you know what is the meaning of char number outside the unicode range? I know that at this stage, luaotfload can assign nodes glyph ID's instead of unicode, and that these numbers can depend on whether you use harfbuzz or the default shaping method and maybe even the font, but I did not understand how these numbers are calculated.

vlasakm commented 11 months ago

By the way, @vlasakm if you will read the linked post, do you know what is the meaning of char number outside the unicode range? I know that at this stage, luaotfload can assign nodes glyph ID's instead of unicode, and that these numbers can depend on whether you use harfbuzz or the default shaping method and maybe even the font, but I did not understand how these numbers are calculated.

In ConTeXt font processing code (i.e. what luaotfload calls the node shaper) they stick with Unicode before and after shaping. They use one of the Unicode private areas to map to Unicode even glyphs that have glyph id, but no Unicode codepoint (e.g. ligatures, which becomes intersting with e.g. ff which has a code point in Unicode). I am not sure how exactly the numbers are calculated. But since they correspond to glyph ids I would guess that 1:1 mapping (i.e. glyph_id + 0xF0000 to map to the plane 15 private use area would be the most straightforward. Or assigning the private use area code points to glyphs in the order they are encountered.

In your code you use luaotfload with harfbuzz (the harf shaper), hence the situation is different, and apparently code points "outside of Unicode" are assigned to glyphs not directly corresponding to Unicode code points.

See:

https://github.com/latex3/luaotfload/issues/198 https://github.com/latex3/luaotfload/issues/185 https://www.pragma-ade.nl/general/manuals/fonts-mkiv.pdf (search for e.g. private, note the trailing space)

Do any of these functions luaotfload manual, section 11.2.1 Font Properties help with the (reverse) mapping? I am getting confused by the slot / gid names and not sure if it is relevant.

Anyways, this seems out of my area of expertise, I suggest luaotfload github or in specific cases the ntg-context mailing list.