latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
123 stars 34 forks source link

\endotherlanguage leaves @ignore globally true #288

Closed davidcarlisle closed 3 months ago

davidcarlisle commented 3 months ago

in babel.dtx there is

% The |\endotherlanguage| part of the environment tries to hide
% itself when it is called in horizontal mode.
%
%    \begin{macrocode}
\long\def\endotherlanguage{%
  \global\@ignoretrue\ignorespaces}

The \global does nothing here as \@ignoretrue is always global, but more seriously this is done in an arbitrary hlist with no mechanism for ever setting the flag back to false.

As such various places (including after inline math) that do \if@ignore \ignorespaces \fi will ignore spaces which should not have been ignored.

See

https://tex.stackexchange.com/q/712301/1090

for one recently reported example were $\le$ 5 gets no space before the 5.

davidcarlisle commented 3 months ago

sorry this report is not fully accurate the usual use case here is \end{otherlanguage} so \ignorespaces would not work you could do \ignorespacesafterend but that just sets the global flag as in the original. babel could check for endotherlanguage being called directly, and if so just do \ignorespaces or say that its an error to call it that way (somewheer in biblatex) I will update this issue later

davidcarlisle commented 3 months ago

This definition would only set the flag if called in the end code (when it will be set back automatically)

\makeatletter
\def\endotherlanguage{\ifx\@currenvir\otherlanguage@env\ignorespacesafterend\fi\ignorespaces}
\def\otherlanguage@env{otherlanguage}
\makeatother
jbezos commented 3 months ago

The fact there is a \ignorespaces at the end seems to imply \endotherlanguage was considered a valid option. As to the suggested fix, there is an issue, because otherlanguage can be nested, so checking the current environment isn’t reliable. Maybe the real problem here is the space is ignored by default (to be honest, I don’t know the reason of this behavior).

davidcarlisle commented 3 months ago

not really: spaces are ignored anyway after \endotherlanguage as usual space after macro name parsing, you only need to inject \ignorespaces with the \end{otherlanguage} syntax to ignore a space after the }

FrankMittelbach commented 3 months ago

but why do you want to do that in the first place? To me that looks wrong. If used in horizontal mode then it seem much more natural to have ...foo\end{otherlanguage} bar... instead of being fored to put he space inside, i.e., ...foo \end{otherlanguage} bar.... In other words I don't think there should be any space handling on the outside, but only on the inside, i.e., \ignorespaces at the beginning and \ifhmode\unskip\fi at the end because that is where spaces are typically ignored (and not wanted but creap in due to source formatting).

davidcarlisle commented 3 months ago

@FrankMittelbach that was my thought as well

jbezos commented 3 months ago

Still thinking about it. I was reading old versions (the oldest I’ve got is 3.4f, 1994, but without otherlanguage; 3.6Z, 2000 does define it). Anyway [Edited example]:

\documentclass{article}

\usepackage[danish, english]{babel}

\def\other#1{\otherlanguage{#1}\languagename\endotherlanguage}

\begin{document}

(\languagename) \other{danish} (\languagename)
% Prints (english) danish(danish)

\selectlanguage{english}

(\languagename) {\other{danish}} (\languagename)
% Prints (english) danish (english)

(\languagename)
\begin{otherlanguage}{danish}\languagename\end{otherlanguage}
(\languagename)
% Prints (english) danish(english)

\end{document}

So, clearly, otherlanguage must be used with \begin/\end or with an ‘external’ group. In the latter case, {\selectlanguage{danish}...} makes more sense. By the way, the fact spaces are ignored is documented (and it looks wrong to me, too).

AlMa1r commented 3 months ago

I've just finished testing @davidcarlisle 's

\makeatletter
\def\endotherlanguage{%
  \ifx\@currenvir\otherlanguage@env\ignorespacesafterend\fi
  \ignorespaces}
\def\otherlanguage@env{otherlanguage}
\makeatother

from http://tex.stackexchange.com/a/712381 . His fix, inserted in somewhere between \usepackage[…]{babel} and the end of the preamble, works great on my input of around 480 pages for both latex/pdflatex (the output was broken before) and lualatex (the output was not broken before). Thanks a lot!

jbezos commented 3 months ago

@moewew This issue seems related to biblatex. Can you take a look at it?

moewew commented 3 months ago

biblatex essentially does

\begingroup
\expandafter\csname otherlanguage\expandafter\endcsname\expandafter{\abx@field@langid}
...
\csname endotherlanguage\endcsname
\endgroup

where \abx@field@langid holds the language we want (and the otherlanguage bit in the code above is actually stored in another macro, so that we can easily switch to otherlanguage* or some other environment if we needed).

Are you saying that this is no longer supported and we have to use the \begin{otherlanguage}{<language>}...\end{otherlanguage} form?

jbezos commented 3 months ago

@moewew As far as I can tell, using the commands instead of the environment has never been supported (I was reading some old versions of babel and I can’t find anything in this regard), but note the same applies in fact to any other environment setting \@ignoretrue, in any package or even the LaTeX kernel. Furthermore since the group is not closed, \languagename, captions, date, hyphenation and the like aren’t restored (except if there is an explict \endgroup or } after, of course).

Here is a possible solution if for some reason you don’t want \begin/\end (eg, to avoid an additional group). Since what otherlanguage does is basically to call \selectlanguage, and \endselectlanguage is not defined, you can use it instead, provided the name of the latter is built with \csname. This way there are no side effects related to the spacing.

If it works for you, I’ll document it as an alternative to otherlanguage (actually, I’ve sometimes used this trick).

I repeat the example above with this idea:

\documentclass{article}

\usepackage[danish, english]{babel}

\def\other#1{\otherlanguage{#1}\languagename\endotherlanguage}

\begin{document}

(\languagename) \other{danish} (\languagename)

\selectlanguage{english}

(\languagename) {\other{danish}} (\languagename)

(\languagename)
\begin{selectlanguage}{danish}\languagename\end{selectlanguage}
(\languagename)
% Prints (english) danish (english)

(\languagename)
{\selectlanguage{danish}\languagename\csname endselectlanguage\endcsname}
(\languagename)
% Prints (english) danish (english)

\end{document}
moewew commented 3 months ago

I don't really understand the exact source of the problem, but as I said, biblatex uses \beingroup\otherlanguage and \endotherlanguage\endgourp instead of \begin{otherlanguage}...\end{otherlanguage} presumably because it is more convenient (and because the assumption was that \begingroup\foo ... \endfoo\endgroup would be similar enough to \begin{foo}...}\end{foo} that everything works as expected - as apparently it did until a couple of months ago). This is absolutely not about grouping.

It's just that

\documentclass{article}

\usepackage[danish, english]{babel}

\makeatletter
\def\blx@thelangenv{otherlanguage}

\def\blx@abx@field@langid{danish}

\def\blx@beglang{%
  \begingroup
  \expandafter\csname\expandafter\blx@thelangenv\expandafter\endcsname
    \expandafter{\blx@abx@field@langid}}

\def\blx@endlang{%
  \csname end\blx@thelangenv\endcsname
  \endgroup
}

\def\other#1{%
  \blx@beglang
  \languagename
  \blx@endlang}
\makeatletter

\begin{document}

(\languagename) \other{danish} (\languagename)

\selectlanguage{english}

(\languagename) {\other{danish}} (\languagename)

(\languagename)
\begin{selectlanguage}{danish}\languagename\end{selectlanguage}
(\languagename)
% Prints (english) danish (english)

(\languagename)
{\selectlanguage{danish}\languagename\csname endselectlanguage\endcsname}
(\languagename)
% Prints (english) danish (english)

\end{document}

is much easier to write and understand than

\documentclass{article}

\usepackage[danish, english]{babel}

\makeatletter
\def\blx@thelangenv{otherlanguage}

\def\blx@abx@field@langid{danish}

\def\blx@beglang{%
  \begingroup
  \expandafter\expandafter\expandafter\begin
    \expandafter\expandafter\expandafter{%
      \expandafter\blx@thelangenv\expandafter}%
        \expandafter{\blx@abx@field@langid}}

\def\blx@endlang{%
  \expandafter\end\expandafter{\blx@thelangenv}%
  \endgroup
}

\def\other#1{%
  \blx@beglang
  \languagename
  \blx@endlang}
\makeatletter

\begin{document}

(\languagename) \other{danish} (\languagename)

\selectlanguage{english}

(\languagename) {\other{danish}} (\languagename)

(\languagename)
\begin{selectlanguage}{danish}\languagename\end{selectlanguage}
(\languagename)
% Prints (english) danish (english)

(\languagename)
{\selectlanguage{danish}\languagename\csname endselectlanguage\endcsname}
(\languagename)
% Prints (english) danish (english)

\end{document}

I can and probably should change this on the biblatex side, but I'd like to understand first what exactly went wrong here.

jbezos commented 3 months ago

I’ll try to explain tomorrow what's going on when \end is executed and how \@ignoretrue works, but in the meanwhile note the following is fine:

\expandafter\begin\expandafter{\blx@thelangenv}{\blx@abx@field@langid}

(Or should be fine. If it fails, it’s a bug or a very old version.)

jbezos commented 3 months ago

Let’s assume you want to define an environment so that spaces after it are ignored. Something like this does not work:

\newenviroment{ignoring}{}{\ignorespaces}

The reason is the spaces to be ignored must come just after \ignorespaces, but this code is not executed at the very end of \end{ignoring} as we can see readily from its definition:

  \romannumeral
    \IfHookEmptyTF{env/#1/end}%
        {\expandafter\z@}%
        {\z@\UseHook{env/#1/end}}%
    \csname end#1\endcsname\@checkend{#1}%
    \expandafter\endgroup\if@endpe\@doendpe\fi
    \UseHook{env/#1/after}%
    \if@ignore\@ignorefalse\ignorespaces\fi
}

Well, not so readily. Let’s remove the code related to hooks:

    \csname end#1\endcsname\@checkend{#1}%
    \expandafter\endgroup\if@endpe\@doendpe\fi
    \if@ignore\@ignorefalse\ignorespaces\fi

When the ‘end’ code is executed (the first line) there are no spaces after, so it serves to nothing. Now:

\newenviroment{ignoring}{}{\@ignoretrue}

\@ignoretrue sets a flag globally telling \end an \ignorespaces is required at the very end. This flag is set in the first line with \end<environment>and then unset, again globally, by \end (last line). Without the latter, the flag remains set and an \ignorespaces may be added later outside of our control.

This is how otherlanguage has behaved for a quarter of a century, and the fact spaces are removed is documented. It's hard to say now why things are like this, but, sadly, they are. For backwards compatilbility, modifying otherlanguage doesn’t seem feasible, but deprecating/discouraging it with some alternative does.

moewew commented 3 months ago

We've got https://github.com/plk/biblatex/commit/134da03fd355bf342cb919940fbc16267fe7651f now. Fingers crossed that this doesn't break anything else.

jbezos commented 3 months ago

I said:

If it works for you, I’ll document it as an alternative to otherlanguage (actually, I’ve sometimes used this trick).

Actually, the fact selectlanguage can be used as environment is been documented for a long time! 😯

AlMa1r commented 3 months ago

We've got plk/biblatex@134da03 now. Fingers crossed that this doesn't break anything else.

@moewew @jbezos Now I tested biblatex 3.20, biber 2.20, and babel 24.2 from TeX Live 2024 on a book of about 500 pages. My output looks good so far, i.e., not broken in this respect any longer. Nothing else seems to be affected. Thank you all!