reutenauer / polyglossia

An alternative to Babel for XeLaTeX and LuaLaTeX
http://www.ctan.org/pkg/polyglossia
MIT License
187 stars 52 forks source link

Incorrect minus position for negative nembers in RTL context #590

Closed seloumi closed 1 year ago

seloumi commented 1 year ago

With the recent version of polyglossia minus sign is reversed It appears to the right of the number when It should appear on the left

\documentclass{article}
\usepackage{polyglossia}

\setdefaultlanguage{arabic}
\newfontfamily\arabicfont[Script=Arabic]{Amiri}

\begin{document}
\Huge
-1
\end{document}
jspitz commented 1 year ago

Do you mean to imply ("with the recent version") that this worked in previous versions of polyglossia? If so, can you remember which version?

seloumi commented 1 year ago

I'm not sure, I think the problem appeared with the last version or the one before it

jspitz commented 1 year ago

I now tried back to version 1.42 (Oct 2018), and still the same. Are you sure this is not due to another package update (e.g., bidi)? I really doubt this is something we handle in polyglossia.

seloumi commented 1 year ago

You're right, I didn't pay attention to that, sorry

Udi-Fogiel commented 1 year ago

It think polyglossia does somehow does changes the behavior. Compare the following (I intentionally did not load the font with the Arabic script)

% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{arabic}
\newfontfamily\arabicfont[Mapping=tex-text]{Amiri}

\begin{document}
    -1

    \beginL\expandafter\meaning\the\font
\end{document}

which produce

image

while

% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{fontspec}
\usepackage[RTLdocument]{bidi}
\setmainfont{Amiri}

\begin{document}
    -1

    \beginL\expandafter\meaning\the\font
\end{document}

produce

image

Udi-Fogiel commented 1 year ago

Just to make things clear, I think the hyphen should be on the right, as this is not a minus sign... (and with the Arabic script it also appears on the right with plain bidi), but it does look like polyglossia is doing something. FWIW I could not reproduce the same with Hebrew.

Udi-Fogiel commented 1 year ago

Ok, this is clear now. bidi inserts \if@nonlatin\char"200F, where \char"200F is a RTL mark (0x200f) between numbers, Arabic numerals and number separators ( - .), via \XeTeXinterchartoks. Weather it should do it is debatable.

gloss-hebrew.ldf and gloss-uyghur.ldf "fix" it, this is why I could not reproduce with Hebrew. The "fix" in these files are debatable as well, as giving a correct script changes the behavior, so what is the correct one?

Udi-Fogiel commented 1 year ago

This is my two cents on this topic. I think bidi is trying to mimic the affect of a font with a RTL script on the Unicode bidi algorithm XeTeX applies (at least partially), even when a script is not given explicitly, and when the conditional \if@nonlatin (which actually polyglossia sets, and not bidi), is true.

The comments in gloss-hebrew.ldf (line 125 specifically) suggested this is wrong for hebrew, but consider the following two files

% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont{David CLM}

\begin{document}
    שלום -12 
\end{document}

image

% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage{hebrew}
\newfontfamily\hebrewfont[Script=Hebrew]{David CLM}

\begin{document}
    שלום -12
\end{document}

image

If I remove the "fix" from gloss-hebrew.ldf, both codes produce the second result (which IMHO is the correct one).

If you think bidi should not insert RTL marks, and that the text should only be affected by the font script, and not by \if@nonlatin, then restore all characters to character class 0, the current "fix" is a bit messy.

jspitz commented 1 year ago

The Hebrew fix has been done per user request to fix #441.

Udi-Fogiel commented 1 year ago

I'm sorry, but I don't really understand how this is a fix for what is described in #441. I think the real source of the problem is using fonts with wrong Script tags.

In the following examples I will use \newfontfamily\hebrewfont instead of \setmainfont beacuse polyglossia adds the Script tag automatically in the latter case (maybe that what changed in 2020 that fixed the code in the examples given in #441).

with default settings of polyglossia

\documentclass[12pt]{article}

\usepackage{polyglossia}
\setdefaultlanguage{hebrew} 
\newfontfamily\hebrewfont[Mapping=tex-text]{Times New Roman}

\begin{document}
% wrong: misplaced bracket:
[א 1-2]

% good: bracket in place:
[א 1 2]
\end{document}

image

This is what I get from the same code after removing the fix for #441 image

And this is with the same code but adding Script=Hebrew to the font image

with default settings of polyglossia

\documentclass[12pt]{article}

\usepackage{polyglossia}
\setdefaultlanguage{hebrew} 
\newfontfamily\hebrewfont[Mapping=tex-text]{Times New Roman}

\begin{document}
    % wrong: misplaced full-stop:
    עמוד 1-2.

    % good: full-stop in place:
    עמוד 1 2.
\end{document}

image

This is what I get from the same code after removing the fix for #441 image

And this is with the same code but adding Script=Hebrew to the font image

As you can see, what bidi was trying to do is to get a closer result the the correct one, but it not always works. Without bidi's patch, things are even further from the correct result.

Udi-Fogiel commented 1 year ago

Tested with plain xetex, got the same results, e.g., got the first results with no patching from bidi, or hebrew script, the second with the patching of bidi, and the third with loading the font with the hebrew script.

jspitz commented 1 year ago

If you can prepare a PR of what you think is correct Hebrew behavior, I can merge that. I just want to prevent some ping-pong game of changing behavior.

jspitz commented 1 year ago

Why did you reopen this? We are only discussing the Hebrew exception to the fix here, not Arabic (the topic of this ticket).

Udi-Fogiel commented 1 year ago

After trying some more examples, it seems that in some cases the way bidi patches things is desired, and in some cases it makes things worse. It looks like the author of bidi is inactive (at least there was no movement in the repository, nor he responded to a PR I've made), so I can't really tell the reason behind this patch.

Using the correct script fix things with and without bidi's patch, so in the end I don't have a strong opinion about keeping the patch, or not.

But, if it will be decided to prevent bidi's patch for Hebrew, maybe a few changes in the code would be good. For example, shouldn't the test in

\def\nohebrew@ltr@numbers{%
    \ifxetex
      % Restore bidi's \DigitsDotDashInterCharToks
      \let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks%
    \fi%
}

be made outside the definition of the macro? there is no need to test the engine each time the macro is evoked.

I'm also wondering if the best way to disable bidi's patch is with \renewcommand*{\DigitsDotDashInterCharToks}{}. Do you happen to know if \XeTeXcharclass expands the macros before inserting them to the horizontal list? can it interfere with spacing, hyphenation, kerning or anything else?

Maybe restoring all characters to character class 0 is better, e.g.,

\bidi@digits=`\0 \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\9 \advance\bidi@digits \@ne \repeat
\bidi@digits=`\٠ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\٩ \advance\bidi@digits \@ne \repeat
\bidi@digits=`\۰ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\۹ \advance\bidi@digits \@ne \repeat

Do you have any thoughts about it?

jspitz commented 1 year ago

But, if it will be decided to prevent bidi's patch for Hebrew, maybe a few changes in the code would be good. For example, shouldn't the test in

\def\nohebrew@ltr@numbers{%
    \ifxetex
      % Restore bidi's \DigitsDotDashInterCharToks
      \let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks%
    \fi%
}

be made outside the definition of the macro? there is no need to test the engine each time the macro is evoked.

But then the macro is undefined with LuaTeX, and we'll get an error. Or did I misunderstand your proposal?

I'm also wondering if the best way to disable bidi's patch is with \renewcommand*{\DigitsDotDashInterCharToks}{}. Do you happen to know if \XeTeXcharclass expands the macros before inserting them to the horizontal list? can it interfere with spacing, hyphenation, kerning or anything else?

Not to my knowledge.

Maybe restoring all characters to character class 0 is better, e.g.,

\bidi@digits=`\0 \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\9 \advance\bidi@digits \@ne \repeat
\bidi@digits=`\٠ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\٩ \advance\bidi@digits \@ne \repeat
\bidi@digits=`\۰ \loop \XeTeXcharclass \bidi@digits 0 \ifnum\bidi@digits<`\۹ \advance\bidi@digits \@ne \repeat

Do you have any thoughts about it?

Given the issue does not occur if the script is correctly set, I am actually leaning towards removing the hebrew@ltr@numbers workaround altogether.

Udi-Fogiel commented 1 year ago

Never mind... just compared a large document without removing what bidi does and it seems it does harm in some cases...

But then the macro is undefined with LuaTeX, and we'll get an error. Or did I misunderstand your proposal?

I'm suggesting to change it to something like

\newcommand*\nohebrew@ltr@numbers{}
\ifxetex
\def\nohebrew@ltr@numbers{%
      % Restore bidi's \DigitsDotDashInterCharToks
      \let\DigitsDotDashInterCharToks\xpg@orig@DigitsDotDashInterCharToks
}
\fi
jspitz commented 1 year ago

Sure. I don't think the test is really expensive, though.

jspitz commented 1 year ago

Nonetheless done at 1dad132f4c98