Closed ghost closed 1 year ago
use lualatex
instead of pdflatex
and it works. To me it looks like a problem with babel
@hvoss49 it is a problem steeming from babel-greek. It redefines \@roman and \@Roman (to ensure that they print roman numbers also when greek is active) and then they are no longer usable as page numbers in a index entry. Probably the best would be to change \index so that it can undo this definition locally.
use
lualatex
instead ofpdflatex
and it works. To me it looks like a problem withbabel
@hvoss49 This is meant to be a bug report against babel-greek
(or babel
) specifically with pdflatex
/latex
. (For lualatex
and xelatex
I do something else anyway in my larger, non-minimal documents.)
Similar problem with backref
(see also plk/biblatex/issues/1175):
\documentclass[greek,english]{book}
\usepackage{babel}
\usepackage[backref=page]{hyperref}
%Workaround:
%\usepackage{etoolbox}
%\AtBeginDocument{\robustify\ensureascii}
\begin{document}
\frontmatter
\cite{article-minimal}
\mainmatter
\cite{article-minimal}
\bibliographystyle{plain}
\bibliography{xampl}
\end{document}
@jspitz There is a combination of factors, and as usual an \edef
, with an unprotected expansion, is involved (in hyperref
), which means \protect
’s are just ignored, while \robustify
relies on the primitive \protected
. Although, IMO, the real culprit is the \edef
, I’ll investigate if \ensureascii
can be based on \protected
instead of on \protect
.
@jbezos making \ensureascii robust will avoid the error but break the links. Hyperref will then try to create links to destinations like page.\ensureascii {ii}
:
pdfTeX warning (dest): name{page.\\ensureascii\040{ii}} has been referenced but
does not exist, replaced by a fixed one
pdfTeX warning (dest): name{page.\\ensureascii\040{i}} has been referenced but
does not exist, replaced by a fixed one
I'm not sure how to get around the problem, but at the core hyperref assumes that \thepage
is expandable and doesn't contain formatting instructions like \fontencoding{OT1}\selectfont
.
First, you might wish to check whether, in the case of nonmain greek
, babel(-greek) could redefine stuff it needs to redefine only within the scope of language-changing commands and environments. (This holds for any language, not just for Greek. E.g., I have also french
as a nonmain language and have been getting French-and-caption-related warnings in the log ever since though the only French text I have in a huge German document are a few French proper names. In the long run, useless warnings are nervous.) This will at least mitigate this issue and similar issues.
Second, if after that you still have a real-world error and need both plain-text and formatted page numbers, you might perhaps choose to split \thepage
in two versions: one for printing (with any kinds of formatting commands that go with it), another for referencing (plain text, without any kinds of commands including \ensureascii
). Or even better, instead of trying to remove the formatting from a formatted page number (which raises issues) on need, you could add formatting to a plain-text page number on need; adding formatting commands is easier than removing them, after all.
First, you might wish to check whether, in the case of nonmain
greek
, babel(-greek) could redefine stuff it needs to redefine only within the scope of language-changing commands and environments.
Like this:
--- /usr/local/texlive/current/texmf-dist/tex/generic/babel-greek/greek.ldf
+++ /tmp/meld-tmp_0c8523p.ldf
@@ -104,6 +104,10 @@
\makeatother
}{}
}
+\let\latin@roman\@roman
+\let\latin@Roman\@Roman
+\let\bbl@greek@roman\@roman
+\let\bbl@greek@Roman\@Roman
\@ifl@aded{def}{lgrenc}{%
\ProvideTextCommand{\textcopyright}{LGR}{\ensureascii{\textcopyright}}
\ProvideTextCommand{\textregistered}{LGR}{\ensureascii{\textregistered}}
@@ -113,8 +117,8 @@
\ProvideTextCommand{\textampersand}{LGR}{\ensureascii{\ltx@amp}}
\DeclareRobustCommand{\&}{\ifmmode\ltx@amp\else\textampersand\fi}
\ProvideTextCommand{\SS}{LGR}{\ensureascii{\SS}}
- \def\@roman#1{\expandafter\ensureascii\expandafter{\romannumeral#1}}
- \def\@Roman#1{\expandafter\ensureascii\expandafter{%
+ \def\bbl@greek@roman#1{\expandafter\ensureascii\expandafter{\romannumeral#1}}
+ \def\bbl@greek@Roman#1{\expandafter\ensureascii\expandafter{%
\expandafter\@slowromancap\romannumeral#1@}}
\DeclareRobustCommand{\greektext}{%
\fontencoding{LGR}\selectfont
@@ -486,6 +490,14 @@
\DeclareTextCompositeCommand{\`}{LGR}{^^9f}{\LGR@hiatus}
\addto\extraspolutonikogreek{\languageshorthands{greek}}%
\declare@shorthand{greek}{~}{\greek@tilde}
+ \addto\extrasgreek{%
+ \let\@roman\bbl@greek@roman
+ \let\@Roman\bbl@greek@Roman
+ }
+ \addto\noextrasgreek{%
+ \let\@roman\latin@roman
+ \let\@Roman\latin@Roman
+ }
}{} % End of LGR-specific code.
\providecommand*{\anwtonos}{\textdexiakeraia}
\providecommand*{\katwtonos}{\textaristerikeraia}
This won't help if Greek is the active language, so a fix at the core is still needed. But I think it should be done anyway.
babel(-greek) could redefine stuff it needs to redefine only within the scope of language-changing commands and environments.
Well that wouldn't help in the case of page numbers as the language scope in which such a page number is stored (with \label
etc) can be different from the scope in which it is used (via \ref
etc).
you might perhaps choose to split \thepage in two versions:
That plan is better, one should store the number (e.g.) and the intended formatting (e.g. \roman). But the problem here is again with labels and ref: Even with hyperref, which extends the label system, there is not enough place to move both informations around.
Well that wouldn't help in the case of page numbers as the language scope in which such a page number is stored (with
\label
etc) can be different from the scope in which it is used (via\ref
etc).
Perhaps, one could think of attempting to store some of the necessary local data, such as the language or the formatting, together with or alongside the stored label and to use this data at the point of the usage of the label.
well you can try with zref, it allows you to store more data. But it will not be trivial to format all locations like index and bibliographies where page numbers are used correctly. Imho the best is to drop LGR encoding and \ensureascii
by using an unicode engine.
To the systems architect in me, storing more data seems to border on a small architectural change, which seems to require more work than a quick-and-dirty hack. (Architectural changes and cleanups for most software projects are inevitably required; if their code evolves in small quick-and-dirty steps, it usually becomes unmanageable ad-hoc spaghetti. I view switching to [Xe|Lua]LaTeX as another, more profound architectural change. Still, [pdf]latex is alive and does not have a trivial replacement everywhere yet: I personally dealt with svmono and arxiv.org.)
babel(-greek) could redefine stuff it needs to redefine only within the scope of language-changing commands and environments.
Well that wouldn't help in the case of page numbers as the language scope in which such a page number is stored (with
\label
etc) can be different from the scope in which it is used (via\ref
etc).
True, but it helps in all cases where Greek is not active (but loaded), e.g. the MWE in https://github.com/latex3/babel/issues/170#issuecomment-1229431476. In any case, I don't see why babel-greek should globally redefine \@roman
once and for all.
True, but it helps in all cases where Greek is not active (but loaded),
If you don't have references to roman numbers in greek parts of your document, you can simply reinstate the default LaTeX definitions everywhere and be done with it. But if you have such references then they will error or give faulty links or faulty output with your solution. So what do you gain?
@u-fischer you are right, I forgot about \pageref
to non-Greek roman pages within Greek context. So I retract my proposal.
@PeterMuellerr FYI that's this case:
\documentclass[greek,english]{book}
\usepackage{babel}
\begin{document}
\frontmatter
a\label{x}
\clearpage
\selectlanguage{greek}Page \pageref{x}
\end{document}
This would falsely come out as ι
(rather than i
) if the \@roman
redefinition would be restricted to Greek language context.
@jspitz
\selectlanguage{greek}Page \pageref{x}
Thanks! As of now, it comes out as “Παγε i”. I believe you concerning “ι” if you have tested this. I apologize for having forgotten that there is persistent, stored stuff that has to be dealt with, too. Anyway, is this a realistic example? Wouldn't you write, perhaps,
\selectlanguage{greek}Σελίδα \selectlanguage{english}\pageref{x}
instead of\selectlanguage{greek}Page \pageref{x}
? If a user switches the language for the main text himself/herself, it could be argued that he/she should switch or consider switching languages for the references, too, since, in high-level terms, “i” is not Greek-language text. Of course, I know the user should better be relieved of switching languages himself/herself.
making \ensureascii robust will avoid the error but break the links. Hyperref will then try to create links to destinations like
page.\ensureascii {ii}
:
@u-fischer You’ve closed my investigations before I started 🙂, but I was expecting something like that. Of course, the root of the problem is assuming \thepage
is fully expandable. See, for example:
But, I agree \roman
must not be redefined globally when Greek isn’t the main language, and the current maintainer of babel-greek
has been informed. Maybe it's time to insist.
Second, if after that you still have a real-world error and need both plain-text and formatted page numbers, you might perhaps choose to split \thepage in two versions: one for printing (with any kinds of formatting commands that go with it), another for referencing (plain text, without any kinds of commands including \ensureascii).
@PeterMuellerr This would be the ideal solution, even if, as pointed out by Ulrike, it’s not trivial.
You’ve closed my investigations before I started
@jbezos well I think some investigation in this area are needed, not only for roman/greek. The backref example also fails for spanish, and a number of packages redefine also \@arabic
which can lead to problems too.
Of course, the root of the problem is assuming \thepage is fully expandable
The root of the problem is that page numbers are used in many places by various tools with differing requirements: makeindex wants to sort them, biblatex wants to compress page references to ranges, hyperref wants to create destinations, links and page labels, the label/ref wants to move it through the aux, and the document wants to print them in various formattings depending on the current language and the place where it is printed. All this works fine if page numbers are expandable and expand to something simple but gets quite difficult if language depending formatting is added.
@u-fischer Or fancy, but valid, formats like 3▪4 (where 3 is the chapter and 4 the page in the chapter, and ▪ is your favorite bullet in your favorite font, or even an image).
@jbezos Yes. If you want to investigate here an example. As you can see the main problem is the index and not so much hyperref. The example ignores the problem of multi languages. Also the XXXX-
in definition of \blub
is expandable, and so the not-protected version works in the index here (but break page links) but in in real world examples it would contain e.g. font selections command which can expand in the index and so would break there too.
If someone could come up with a good idea how to handle the index I could add support in hyperref - but I won't add a variety of commands like \ensureascii
etc, it then should be one common command (or perhaps one for each numbering style) and the language files would have to coordinate their access to such a formatting command.
\documentclass[]{book}
\usepackage{index,etoolbox}
\makeindex
\usepackage[backref=page]{hyperref}
\makeatletter
\def\@roman#1{\expandafter\blub\expandafter{\romannumeral#1}}
% \newcommand\blub[1]{XXXX-#1} %not protected works more less in index, but breaks links
% \DeclareRobustCommand\blub[1]{XXXX-#1} robust
\protected\def\blub#1{XXXX-#1} %protected %miss index entries
\pdfstringdefDisableCommands{\let\blub\@firstofone}
\patchcmd\hyper@link@{\edef\Hy@tempb{#3}}{\let\blub\@firstofone\edef\Hy@tempb{#3}}{}{\fail}
\begin{document}
\frontmatter
first page
\index{orange}
\newpage
\cite{article-minimal}
second page \phantomsection\label{abc}
\mainmatter
mainmatter
\index{duck}\index{orange}
\pageref{abc}
\cite{article-minimal}
\bibliographystyle{plain}
\bibliography{xampl}
\printindex
\end{document}
@u-fischer Concerning your proposal of using zref
, do you think of something similar to the code below? It would allow us, IMHO, to store \languagename
, \the\c@page
, and formatting separately.
An investigation of concept; too simple for real life:
\documentclass{book}
\usepackage{zref}
\usepackage[greek,english]{babel}
\makeatletter
\zref@newlist{pageWithLang}
\zref@newprop*{lang}[english]{\languagename}
\zref@addprops{pageWithLang}{page,lang}
\newcommand{\labelWithLang}[1]{%
\zref@setcurrent{page}{\thepage}%% Or, say, \romannumeral\the\c@page , if you know that the last command changing pagenumbering set page numbers to roman.
\zref@setcurrent{lang}{\languagename}%% This is a simplification. Frankly speaking, I don't know how to get the language with which the current page number has been created. Usually any Latin-based language would do, but page numbers can also be Hebrew, for example .
\zref@labelbylist{#1}{pageWithLang}%
}
\newcommand{\pagerefWithLang}[1]{%
\foreignlanguage{%
\zref@extract{#1}{lang}%
}{%
\zref@extract{#1}{page}%
}%
}
\makeatother
\begin{document}
\frontmatter
\labelWithLang{englishPageLabel}English page\\
Pages \pagerefWithLang{englishPageLabel} and \pagerefWithLang{greekPageLabel}.
\clearpage
\selectlanguage{greek}
\labelWithLang{greekPageLabel}Ελληνική σελίδα\\
Σελίδες \pagerefWithLang{englishPageLabel} και \pagerefWithLang{greekPageLabel}.
\end{document}
@jspitz Would such a code (perhaps, after some changes) jive with your proposal of local-only redefinitions of stuff in greek.ldf?
I am not sure I understand the plan. I don't think babel
wants to load zref
. Maybe in the long term, as LaTeX already started to include some of zref
's concepts (see \@currentcounter
), the LaTeX kernel could provide a way to separate page numer formatting from the actual page number.
I am not sure I understand the plan.
I thought that one of Ulrike's earlier suggestions was to use zref
. The way I understood this, this would provide us with an opportunity to store the plain-text page number separately from language and formatting (given enough effort, no doubt about that). This would allow us to get rid of global redefinitions of stuff by .ldf
similar to what you tried out. (As for what babel
wants or doesn't want to do, it's probably not up to me to comment on that or to suggest that anyone does anything; as of now, I wouldn't be able to execute any changes in babel
in general or greek.ldf
in particular anyway.)
The approach helps to get a proper page reference also with the change to greek.ldf
I proposed. But it's more a user workaround I think than a fix of the problem at the core.
The approach helps to get a proper page reference also with the change to
greek.ldf
I proposed. But it's more a user workaround I think than a fix of the problem at the core.
Yes because it seems that for the fix (in this specific way), we would have to change \label
, \ref
, and \pageref
rather than to introduce \labelWithLang
, \pagerefWithLang
, … .
The problem is more about font encoding and script, only indirect about Babel (because Babel-Greek has to ensure that the Greek script is supported).
The core of the problem is that \roman
and \Roman
expect the active font encoding to be a
"standard text font encoding" but LGR is non-standard :(
Solving this at the core would require
a) support for T7 (standard Greek text encoding, currently not defined), or
b) \roman
and \Roman
as NFSS "TextCommand"s (similar to \copyright
).
For a), we would need agreement on a character table, font encoding definition files and a set of re-encoded fonts. Work on T7 stalled when the Greek TeX community decided that Unicode was better suited for typesetting Greek. However, for monotonic Greek on 8-bit TeX it would still be a vast improvement over LGR.
For b), we would need support for NFSS TextCommands in the places where \roman
and \Roman
are used.
Any change to "greek.ldf" should be checked for adverse side-effects. E.g., in Greek documents, roman numbering is used for nested enumerated lists. If there is an agreement on the best way forward, I am more than happy to implement it in either "greek-fontenc" or "babel-greek".
For non-Greek documents with the occasional Greek symbol or term, babel-greek is an overkill. Using "textalpha" or "alphabeta" instead should solve the indexing and backref problems:
\documentclass{book}
\usepackage{textalpha}
\usepackage[ngerman]{babel}
\usepackage{makeidx}
\makeindex
\begin{document}
\pagenumbering{Roman}
\index{Text}Text
Some text using Greek script: \ensuregreek{λογος}.
Roman numbering is left untouched and fails with Greek with an 8-bit engine:
% abuse some exisisting counters for a quick test:
\setcounter{enumi}{5}
\setcounter{enumii}{3}
\setcounter{enumiii}{9}
item \Roman{enumi}.\roman{enumii}.\roman{enumiii} vs.
\ensuregreek{αντικείμενο \Roman{enumi}.\roman{enumii}.\roman{enumiii}}
\printindex
\end{document}
Sure, the LGR is problematic, but the point here is \roman
and \Roman
are modified for all languages, while changes should be local or, at least, ‘global’ solely when Greek is the main language, so that \thepage
, which is also ‘global’, prints the correct numeral. A second issue is \makeindex
understands text and nothing else (related issue: https://github.com/latex3/babel/issues/26). On the other hand, I think not supporting greek
as a secondary language is not a real solution.
I was working on a new feature (or, rather, on improving an existing one), which will allow to write something like that:
\documentclass{article}
\usepackage[LGR, T1]{fontenc}
\usepackage[english]{babel}
\begin{document}
English \foreignlanguage{greek}{Ελληνικά} English.
\end{document}
(It’s based on https://latex3.github.io/babel/guides/locale-arabic.html#pdftex.)
Am 8.12.22 schrieb Javier Bezos:
... the point here is \roman and \Roman are modified for all languages, while changes should be local or, at least, ‘global’ solely when Greek is the main language
I would prefer local-only changes, too. However this has the potential to silently break existing documents. (While a Iota for number 1 may be only a style problem, the V for number 5 becomes a no-break space!)
An example where the LGR-proof Roman numerals are required also with Greek as secondary language:
\documentclass[a4paper,oneside]{book}
% Save original definition
\makeatletter
\let\bbl@greek@save@roman\@roman
\let\bbl@greek@save@Roman\@Roman
\makeatother
\usepackage[greek,english]{babel}
\makeatletter
% Restore original definition
\let\@roman\bbl@greek@save@roman
\let\@Roman\bbl@greek@save@Roman
% Make Roman numerals LGR-proof only if Greek is the active language:
% LGR-proof Roman numerals
\def\lgr@proof@roman#1{\expandafter\ensureascii\expandafter{\romannumeral#1}}
\def\lgr@proof@Roman#1{\expandafter\ensureascii\expandafter{%
\expandafter\@slowromancap\romannumeral#1@}}
% Switch between original and LGR-proofed version
\addto\extrasgreek{%
\let\@roman\lgr@proof@roman
\let\@Roman\lgr@proof@Roman
}
\addto\noextrasgreek{%
\let\@roman\bbl@greek@save@roman
\let\@Roman\bbl@greek@save@Roman
}
\makeatother
\begin{document}
\frontmatter
\tableofcontents % Check for Iota in Roman page number!
\chapter{English Preface \label{ch:preface}}
Use case:
a document with Greek chapter in the ``frontmatter'' and a ToC.
\selectlanguage{greek}
\chapter{Greek Preface \label{ch:preface-greek}}
logos
\selectlanguage{english}
\mainmatter
\chapter{First Chapter \label{ch:1}}
The English ``preface'' is at page \pageref{ch:preface}.
The Greek ``preface'' is at page \pageref{ch:preface-greek}.
\selectlanguage{greek}
The English ``preface'' is at page \pageref{ch:preface}.
The Greek ``preface'' is at page \pageref{ch:preface-greek}.
\end{document}
A second issue is \makeindex understands text and nothing else (related issue: #26).
It seems "makeindex" can handle TeX macros that have a replacement in *.ist files, like
% save printable macros
merge_rule "\\TeX" "TeX"
Maybe we can fix "makeindex" ensuring \ensureascii
ends up in the *.idx
file and add a merge_rule that removes it for the index generation?
👌 Good example. I’ll study it, but it seems this issue is going to become (another) ‘known issue‘ of the LGR encoding.
Maybe we need a new language option "global-lgr-fixes=[on|off]" or so. After a transition period, the default could become "off".
I’m closing this issue for two reasons. (1) It’s an intrinsic limitation of the non-standard LGR encoding, which is not really part of the babel core. (2) There is now (3.84) a simple alternative to set more or less short Greek texts as a secondary language (see What’s new in babel 3.84).
Thank you for providing another workaround (for small Greek text parts) in Babel 3.84.
(There is a small but confusing documentation error in What’s new in babel 3.84):
fontspec
-> fontenc
(you cannot set a font encoding with fontspec
).)
This adds one level of language support (hyphenation), but would not help in documents requiring translated auto-strings (e.g. for a Greek abstract). I'd like to see more testing and better documentation.
I prepared a new version for contributed babel-greek package and opened an issue there https://codeberg.org/milde/greek-tex/issues/1.
@u-fischer @jspitz
... I could add support in hyperref - but I won't add a variety of commands like
\ensureascii
etc, it then should be one common command (or perhaps one for each numbering style) and the language files would have to coordinate their access to such a formatting command.
babel-greek tries to solve this with a new "TextCommand" (see commit 0f56b).
\ProvideTextCommandDefault{\EnsureStandardFontEncoding}{\@firstofone}
\ProvideTextCommand{\EnsureStandardFontEncoding}{LGR}[1]{%
\ensureascii{#1}}
\AtBeginDocument{\@ifpackageloaded{hyperref}
{\pdfstringdefDisableCommands{%
\let\EnsureStandardFontEncoding\@firstofone}}
{}}
This seems to fix the "backref" issues in my tests. Is there anything missing regarding hyperref?
Feeding
as
mwe.tex
topdflatex
orlatex
leads toin the file
mwe.idx
. As a consequence,makeindex mwe
orxindex mwe
produce an empty filemwe.ind
. Moreover, if you usexindex mwe
, the programxindex
fails with the output...re/texlive/texmf-dist/tex/lualatex/xindex/xindex-lib.lua:524: bad argument #2 to 'format' (number expected, got nil)
. Used versions:Given http://tex.stackexchange.com/a/356649 and http://tex.stackexchange.com/a/633522 , a possible workaround would be to uncomment the commented lines or say
after calling
babel
. Another workaround is saying\def\ensureascii#1{#1}
right after\begin{document}
(cf. http://chat.stackexchange.com/transcript/message/60465101). However, really, it would be probably much cleaner if greek-babel or babel don't redefine\ensureascii
and/or\@roman
and\@Roman
for all languages. (If necessary, they may do it only within the scope of Greek-language commands or if Greek is the main document language). The maintainer ofbabel-greek
has been informed.