Closed seloumi closed 1 year ago
Do you mean to imply ("with the recent version") that this worked in previous versions of polyglossia
? If so, can you remember which version?
I'm not sure, I think the problem appeared with the last version or the one before it
I now tried back to version 1.42 (Oct 2018), and still the same. Are you sure this is not due to another package update (e.g., bidi
)? I really doubt this is something we handle in polyglossia
.
You're right, I didn't pay attention to that, sorry
It think polyglossia does somehow does changes the behavior. Compare the following (I intentionally did not load the font with the Arabic script)
% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{arabic}
\newfontfamily\arabicfont[Mapping=tex-text]{Amiri}
\begin{document}
-1
\beginL\expandafter\meaning\the\font
\end{document}
which produce
while
% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{fontspec}
\usepackage[RTLdocument]{bidi}
\setmainfont{Amiri}
\begin{document}
-1
\beginL\expandafter\meaning\the\font
\end{document}
produce
Just to make things clear, I think the hyphen should be on the right, as this is not a minus sign... (and with the Arabic script it also appears on the right with plain bidi
), but it does look like polyglossia is doing something. FWIW I could not reproduce the same with Hebrew.
Ok, this is clear now. bidi
inserts \if@nonlatin\char"200F
, where \char"200F
is a RTL mark (0x200f) between numbers, Arabic numerals and number separators ( - .), via \XeTeXinterchartoks
. Weather it should do it is debatable.
gloss-hebrew.ldf
and gloss-uyghur.ldf
"fix" it, this is why I could not reproduce with Hebrew. The "fix" in these files are debatable as well, as giving a correct script changes the behavior, so what is the correct one?
This is my two cents on this topic. I think bidi
is trying to mimic the affect of a font with a RTL script on the Unicode bidi algorithm XeTeX applies (at least partially), even when a script is not given explicitly, and when the conditional \if@nonlatin
(which actually polyglossia
sets, and not bidi
), is true.
The comments in gloss-hebrew.ldf
(line 125 specifically) suggested this is wrong for hebrew, but consider the following two files
% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont{David CLM}
\begin{document}
שלום -12
\end{document}
% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont[Script=Hebrew]{David CLM}
\begin{document}
שלום -12
\end{document}
If I remove the "fix" from gloss-hebrew.ldf
, both codes produce the second result (which IMHO is the correct one).
If you think bidi
should not insert RTL marks, and that the text should only be affected by the font script, and not by \if@nonlatin
, then restore all characters to character class 0, the current "fix" is a bit messy.
The Hebrew fix has been done per user request to fix #441.
I'm sorry, but I don't really understand how this is a fix for what is described in #441. I think the real source of the problem is using fonts with wrong Script tags.
In the following examples I will use \newfontfamily\hebrewfont
instead of \setmainfont
beacuse polyglossia
adds the Script tag automatically in the latter case (maybe that what changed in 2020 that fixed the code in the examples given in #441).
with default settings of polyglossia
\documentclass[12pt]{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont[Mapping=tex-text]{Times New Roman}
\begin{document}
% wrong: misplaced bracket:
[א 1-2]
% good: bracket in place:
[א 1 2]
\end{document}
This is what I get from the same code after removing the fix for #441
And this is with the same code but adding Script=Hebrew
to the font
with default settings of polyglossia
\documentclass[12pt]{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont[Mapping=tex-text]{Times New Roman}
\begin{document}
% wrong: misplaced full-stop:
עמוד 1-2.
% good: full-stop in place:
עמוד 1 2.
\end{document}
This is what I get from the same code after removing the fix for #441
And this is with the same code but adding Script=Hebrew
to the font
As you can see, what bidi
was trying to do is to get a closer result the the correct one, but it not always works. Without bidi
's patch, things are even further from the correct result.
Tested with plain xetex, got the same results, e.g., got the first results with no patching from bidi, or hebrew script, the second with the patching of bidi, and the third with loading the font with the hebrew script.
If you can prepare a PR of what you think is correct Hebrew behavior, I can merge that. I just want to prevent some ping-pong game of changing behavior.
Why did you reopen this? We are only discussing the Hebrew exception to the fix here, not Arabic (the topic of this ticket).
After trying some more examples, it seems that in some cases the way bidi patches things is desired, and in some cases it makes things worse. It looks like the author of bidi is inactive (at least there was no movement in the repository, nor he responded to a PR I've made), so I can't really tell the reason behind this patch.
Using the correct script fix things with and without bidi's patch, so in the end I don't have a strong opinion about keeping the patch, or not.
But, if it will be decided to prevent bidi's patch for Hebrew, maybe a few changes in the code would be good. For example, shouldn't the test in
\def\nohebrew@ltr@numbers{%
\ifxetex
% Restore bidi's \DigitsDotDashInterCharToks
\let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks%
\fi%
}
be made outside the definition of the macro? there is no need to test the engine each time the macro is evoked.
I'm also wondering if the best way to disable bidi's patch is with \renewcommand*{\DigitsDotDashInterCharToks}{}
. Do you happen to know if \XeTeXcharclass
expands the macros before inserting them to the horizontal list? can it interfere with spacing, hyphenation, kerning or anything else?
Maybe restoring all characters to character class 0 is better, e.g.,
\bidi@digits=`\0 \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\9 \advance\bidi@digits \@ne \repeat
\bidi@digits=`\٠ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\٩ \advance\bidi@digits \@ne \repeat
\bidi@digits=`\۰ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\۹ \advance\bidi@digits \@ne \repeat
Do you have any thoughts about it?
But, if it will be decided to prevent bidi's patch for Hebrew, maybe a few changes in the code would be good. For example, shouldn't the test in
\def\nohebrew@ltr@numbers{% \ifxetex % Restore bidi's \DigitsDotDashInterCharToks \let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks% \fi% }
be made outside the definition of the macro? there is no need to test the engine each time the macro is evoked.
But then the macro is undefined with LuaTeX, and we'll get an error. Or did I misunderstand your proposal?
I'm also wondering if the best way to disable bidi's patch is with
\renewcommand*{\DigitsDotDashInterCharToks}{}
. Do you happen to know if\XeTeXcharclass
expands the macros before inserting them to the horizontal list? can it interfere with spacing, hyphenation, kerning or anything else?
Not to my knowledge.
Maybe restoring all characters to character class 0 is better, e.g.,
\bidi@digits=`\0 \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\9 \advance\bidi@digits \@ne \repeat \bidi@digits=`\٠ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\٩ \advance\bidi@digits \@ne \repeat \bidi@digits=`\۰ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\۹ \advance\bidi@digits \@ne \repeat
Do you have any thoughts about it?
Given the issue does not occur if the script is correctly set, I am actually leaning towards removing the hebrew@ltr@numbers
workaround altogether.
Never mind... just compared a large document without removing what bidi does and it seems it does harm in some cases...
But then the macro is undefined with LuaTeX, and we'll get an error. Or did I misunderstand your proposal?
I'm suggesting to change it to something like
\newcommand*\nohebrew@ltr@numbers{}
\ifxetex
\def\nohebrew@ltr@numbers{%
% Restore bidi's \DigitsDotDashInterCharToks
\let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks
}
\fi
Sure. I don't think the test is really expensive, though.
Nonetheless done at 1dad132f4c98
With the recent version of polyglossia minus sign is reversed It appears to the right of the number when It should appear on the left