Open ralessi opened 4 years ago
I don't get your output with the development version of luaotfload. With it is looks like this:
This is still not correct, but
Thank you for the references which I will explore. I suspected that this might be unrelated to fontspec. Do you think it should be worth reporting this---maybe unrelated again---issue to the luaotfload bug tracker?
FWIW, this seems to be a regression in luaotfload. Trying the following with harflatex and the old harf code:
\documentclass[12pt]{minimal}
\usepackage{harfload}
\usepackage{ulem}
\begin{document}
\font\arabicfont="[Amiri-Regular.ttf]:mode=harf"
\textdir TRT\arabicfont
مُب^^^^200d\uline{^^^^200dتَ^^^^200d}^^^^200dسِم
\end{document}
Gives:
This was a luaotfload bug which is resolved in the latest dev
branch.
The behavior of HarfBuzz seems a bit odd here but I don't know enough about the script to say if it is a bug or expected behaviour:
The luaotfload bug was that in \hbox
es the direction wasn't recognized correctly. So the \uline
argument was set as TLT
instead of TRT
.
Now to the odd part: For some reason, HarfBuzz seems to reverse the cluster with the arabic characters and ignore the previous ZWJ. This can be reproduced with hb-shape
:
hb-shape --direction=rtl --font-file "$(kpsewhich Amiri-Regular.ttf)" --script=arab --unicodes=U+200D,U+062A,U+064E,U+200D
gives
[space=1+0|uni064E=1@-188,0+0|uni062A.medi=1+244|space=0+0]
as expected, but replacing --direction=rtl
with --direction=ltr
gives
[space=0+0|space=1+0|uni064E=1@-212,0+0|uni062A.init=1+190]
Especially both space
glyphs representing the ZWJs are at the beginning and the initial form is used.
@khaledhosny Is this supposed to happen?
Yes, sort of.
HarfBuzz wants to shape scripts in their native direction. So when setting a direction other than the native direction for a script, HarfBuzz will reverse the buffer before shaping. It will also avoid breaking grapheme clusters, as one does not want, say, a mark to precede its base. ZWJ is a grapheme extender, so the first ZWJ is consider a grapheme cluster by itself (as it extends nothing) and the base+mark+ZWJ are considered another grapheme cluster.
<U+200D>,<U+062A,U+064E,U+200D>
After reversal:
<U+062A,U+064E,U+200D>,<U+200D>
After shaping the buffer will be reversed again since the native direction is RTL (a simple reversal this time with no grapheme clusters business).
U+062A,U+064E,U+200D,U+200D
After reversal:
U+200D,U+200D,U+064E,U+062A
If you set the script to latn
when the direction is ltr
, no reversal will happen:
$ hb-shape --direction=ltr --font-file "$(kpsewhich Amiri-Regular.ttf)" --script=latn --unicodes=U+200D,U+062A,U+064E,U+200D
[space=0+0|uni062A=1+926|uni064E=1+0|space=1+0]
latn
with rtl
will do the initial reversal but not the last one:
$ hb-shape --direction=rtl --font-file "$(kpsewhich Amiri-Regular.ttf)" --script=latn --unicodes=U+200D,U+062A,U+064E,U+200D
[uni062A=1+926|uni064E=1+0|space=1+0|space=0+0]
Shaping a script in a direction other than its native direction is risky and unlikely to always give meaningful result.
@khaledhosny Thank you.
In some cases, namely when commands are inserted between characters,
luatex
+harfbuzz
do not seem to handle thezero width joiner
character (U+200D) properly. Consider the following example, to be compiled withlualatex-dev
: