reutenauer / polyglossia

An alternative to Babel for XeLaTeX and LuaLaTeX
http://www.ctan.org/pkg/polyglossia
MIT License
187 stars 52 forks source link

making Hebrew numerals expandable #596

Closed Udi-Fogiel closed 1 year ago

Udi-Fogiel commented 1 year ago

polyglossia adapted the definitions of \hebrewnumeral, \Hebrewnumeral and \Hebrewnumeralfinal from the old .ldf files of babel, but these macros are not expandable.

One problem that arise from that is wring bookmark numbering in the pdf outline. For example, with the following code I'm getting the warning Token not allowed in a PDF string (Unicode):(hyperref) removing `\@Hebrew@numeral', and the outline is rendered as 1 שלום and not א' שלום

\documentclass{book}

\usepackage{polyglossia}
\usepackage{hyperref}
\hypersetup{bookmarksnumbered}
\setmainlanguage{hebrew}

\newfontfamily\hebrewfont{David CLM}[Script=Hebrew]

\begin{document}
    \appendix
    \chapter{שלום}
\end{document}

I compared the output PDFs of the following file (with various maximum number) with the old version and the one provided and it gives the same result (unless I missed something...)

\documentclass{book}

\usepackage{polyglossia}
\setmainlanguage{hebrew}
\newfontfamily\hebrewfont{David CLM}[Script=Hebrew]

\begin{document}

\def\foo#1#2{%
    \hebrewnumeral{#1},
    \Hebrewnumeral{#1},
    \Hebrewnumeralfinal{#1},
    \ifnum#1<#2
    \expandafter\foo
    \expandafter{\number\numexpr#1+1\expandafter}%
    \expandafter{\number#2\expandafter}%
    \fi}

\foo{1}{1355}

\end{document}
Udi-Fogiel commented 1 year ago

See this post as well.

jspitz commented 1 year ago

Thanks. Did you test thoroughly that the output is the same in all contexts? And also in context where robust commands are needed (such as sections/toc)?

Udi-Fogiel commented 1 year ago

Did you test thoroughly that the output is the same in all contexts?

I don't really understand, which contexts exists? do you mean the three variants, \hebrewnumeral, \Hebrewnumeral and \Hebrewnumeralfinal? I tested with loops as described in the post. But a second eye is always good.

And also in context where robust commands are needed (such as sections/toc)?

A command needs to be robust if it will break in expansion only context, and this patch made Hebrew numerals expandable, so it shouldn't happen...

When I compile this

\documentclass{book}

\usepackage{polyglossia}
\usepackage{hyperref}
\hypersetup{bookmarksnumbered}
\setmainlanguage{hebrew}

\newfontfamily\hebrewfont{David CLM}[Script=Hebrew]

\begin{document}
    \tableofcontents
    \chapter{ \hebrewnumeral{1} שלום}
    \chapter{ \Hebrewnumeral{1} שלום}
    \chapter{ \Hebrewnumeralfinal{1} שלום}
\end{document}

the .toc file contains

\selectlanguage *{hebrew}
\contentsline {chapter}{\numberline {1} א שלום}{3}{chapter.1}%
\contentsline {chapter}{\numberline {2} א׳ שלום}{5}{chapter.2}%
\contentsline {chapter}{\numberline {3} א׳ שלום}{7}{chapter.3}%

so the Hebrew numerals commands does not "survive" when writing to a file, since they are fully expandable.

However, I'm not sure if polyglossia patches some LaTeX internals that are related to the Hebrew numerals commands. Maybe the patching should be changed? although I did not notice any problems.

I now noticed I've dropped the definition of \Alphfinal and \@Alphfinal, I'm not sure where they are needed, but we should keep them for backwards compatibility.

jspitz commented 1 year ago

I just want to make sure we do not introduce a regression. Maybe there was a reason why these commands had been defined as they are in the first place.

I cannot really test myself, so if you think you have tested sufficiently and feel confident with the change, I will push the button.

Udi-Fogiel commented 1 year ago

.Maybe there was a reason why these commands had been defined as they are in the first place.

I think this is for historical reasons. The original code polyglossia used for Hebrew numerals was taken from hebrew.ldf, which has not been updated since 2005. From looking at the code, the author did not assume the use of eTeX, but TeX--XeT was an option as well, and probably did not had the primitive \numexp.

I cannot really test myself, so if you think you have tested sufficiently and feel confident with the change, I will push the button.

I've compared the result of all numerals between 1 and 99,999 with PDF compare tools, and extracted the text from the PDFs I have produced with the new and old code, and compare those as well. Got the same results.

jspitz commented 1 year ago

Good. Thanks, merged.

Udi-Fogiel commented 11 months ago

@jspitz I've noticed that I used quotation marks in my code and not geresh and gershayim (as it was like that before). I think it was originally like that in the babel's .ldf file because geresh and gershayim are not setup to work in pdflatex, regarding all the encoding stuff.

Using geresh and gershayim is grammarickly more correct. Should we change it? or this is considered as breaking backwards compatibility? A new option is always a possibility.

jspitz commented 11 months ago

@Udi-Fogiel which part of the code are you referring to?

Udi-Fogiel commented 11 months ago

@jspitz Sorry, on a second look the code actually use the correct characters. I got confused after looking at the code in the .ldf file...