reutenauer / polyglossia

An alternative to Babel for XeLaTeX and LuaLaTeX
http://www.ctan.org/pkg/polyglossia
MIT License
185 stars 52 forks source link

Changing the directionality of Private Use Area characters in the bidi code #639

Closed yannis1962 closed 1 month ago

yannis1962 commented 1 month ago

I'm building a Hebrew font with about 9,000 additional glyphs. I first wanted to keep them without Unicode assignment and call them through OpenType features, but sadly, their number exceeds Adobe's MakeOTF. So my second thought was to use Unicode PUA ^^^^^0f0000-^^^^^^0ffffd, but when I tried this solution, I found that polyglossia/bidi considers PUA characters as left-to-right by default. So they are breaking the right-to-left direction of my Hebrew words.

Is there someplace in the bidi code where I can change the directionality properties of the PUA characters I need for the given font?

(I'm using XeLaTeX.)

PS. It works when every word starts with a ^^^^200f RTL mark, but I can hardly do that in my document. What I need is a way to change the behavior of some PUA characters in the whole document.

Udi-Fogiel commented 1 month ago

The XeTeX engine shapes chunks of text using ICU, which by consequence it applies the unicode BiDi algorithm on these cuncks of text. By default it is done only if a system font is used, and only word by word (it can be controlled a bit by \XeTeXinterwordspaceshaping), see the following post (specifically the last comment in the answer): https://tex.stackexchange.com/questions/673111/does-xelatex-modify-the-beginr-premitive

If this is the phenomenon you are referring to, I'm afraid this is built into the engine and cannot be changed easily. The best thing to do is, IMO, to not rely on that and use an explicit markup, using the \beginR/\beginL primitives, or the more higher syntax of polyglossia which will make the directionality change according to the used language.

Another option would be to switch to LuaTeX, there you can have more control as there is no unicode BiDi algorithm built into the engine, so you must provide a Lua implementation of it. the babel package has implemented it (with some limitations at the time of writing), and give some interface to change directionality of glyphs. Search in the babel documentation for \babelcharproperty which can let you control the output of characters (the font, directionality, hyphenation etc.). I wrote once an example file using hebrew and english without any markup (except one \selectlanguage{english}) in here: https://github.com/Udi-Fogiel/Hebrew-TeX/blob/main/babel-lualatex.tex

In any case, I don't think this is currently related to polyglossia, although you can make a request for a better right-to=left support in LuaTeX, more information could help.

yannis1962 commented 1 month ago

Thanks for the answer. You say it applies the Unicode BIDI algorithm but in algorithm specifications §3.2 item 3 say

Private-use characters can be assigned different values by a conformant implementation.

which means that there should be no hard-coded directionalities in PUA.

Anyway, I solved the problem by adding ^^^^202e characters in front of every Hebrew PUA, the problem is that now I have no kerning.

I was considering using contextual kerning (with lookahead so that the same third glyph can be used in the next kerning triple), do you think it will work?

Le 21 mai 2024 à 12:25, Udi Fogiel @.***> a écrit :

The XeTeX engine shapes chunks of text using ICU, which by consequence it applies the unicode BiDi algorithm on these cuncks of text. By default it is done only if a system font is used, and only word by word (it can be controlled a bit by \XeTeXinterwordspaceshaping), see the following post (specifically the last comment in the answer): https://tex.stackexchange.com/questions/673111/does-xelatex-modify-the-beginr-premitive

If this is the phenomenon you are referring to, I'm afraid this is built into the engine and cannot be changed easily. The best thing to do is, IMO, to not rely on that and use an explicit markup, using the \beginR/\beginL primitives, or the more higher syntax of polyglossia which will make the directionality change according to the used language.

Another option would be to switch to LuaTeX, there you can have more control as there is no unicode BiDi algorithm built into the engine, so you must provide a Lua implementation of it. the babel package has implemented it (with some limitations at the time of writing), and give some interface to change directionality of glyphs. Search in the babel documentation for \babelcharproperty which can let you control the output of characters (the font, directionality, hyphenation etc.). I wrote once an example file using hebrew and english without any markup (except one \selectlanguage{english}) in here: https://github.com/Udi-Fogiel/Hebrew-TeX/blob/main/babel-lualatex.tex

In any case, I don't think this is currently related to polyglossia, although you can make a request for a better right-to=left support in LuaTeX, more information could help.

— Reply to this email directly, view it on GitHub https://github.com/reutenauer/polyglossia/issues/639#issuecomment-2122297321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFXC7IMLNM4XPIMP3AQJ6TZDMOK3AVCNFSM6AAAAABHQZGSFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGI4TOMZSGE. You are receiving this because you authored the thread.

Udi-Fogiel commented 1 month ago

Thanks for the answer. You say it applies the Unicode BIDI algorithm but in algorithm specifications §3.2 item 3 say Private-use characters can be assigned different values by a conformant implementation. which means that there should be no hard-coded directionalities in PUA.

Yes, the algorithm does not have any hard coded directionalities in PUA, it is the implementation in XeTeX that probably does that. ICU actually gives a way to customize things, see https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html#a69be01f3b9f17fc7cc604b10fa31c2f4, maybe you can ask the XeTeX team to add a support for that in the engine.

Anyway, I solved the problem by adding ^^^^202e characters in front of every Hebrew PUA, the problem is that now I have no kerning. I was considering using contextual kerning (with lookahead so that the same third glyph can be used in the next kerning triple), do you think it will work?

Sounds Like it would work. but I can't really tell without experimenting with that my self. I think you will have easier life using LuaTeX.

I'll close the ticket for now, feel free to reopen if you think it is related to polyglossia.