latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
125 stars 34 forks source link

\MakeUppecase: double space after colon with french #196

Closed jbezos closed 1 year ago

jbezos commented 1 year ago

This issue has been closed, but see https://github.com/latex3/babel/issues/189

Before the colon in headings there is now a double space. It used to work correctly. ¿Related to the new mark mechanism?

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[french]{babel}

\begin{document}
\pagestyle{headings}
\showoutput

\section{Section un :}

\newpage
\section{Section deux :}

\tableofcontents

\end{document}
u-fischer commented 1 year ago

This is not related to the new marks, but to \MakeUppercase.

It would work correctly if the active colon is protected. The main problem is that the \foreignlanguage in the header resets the : again to the old definition.

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[french]{babel}
\pagestyle{headings}
\def\foreignlanguage#1{} %avoid that foreignlanguage reset : again
\begin{document}
\MakeUppercase{un : un:}

\leavevmode \MakeUppercase{un : un:}

\protected\edef:{\unexpanded\expandafter{:}}

\MakeUppercase{un : un:}

\leavevmode \MakeUppercase{un : un:}

\section{Section un : un:}

\end{document}

image

jbezos commented 1 year ago

@u-fischer Any suggestions on how to fix it?

u-fischer commented 1 year ago

well I don't know much about the babel internals, but imho you only need to ensure, that active chars are defined protected from the start in \@initiate@active@char:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[french]{babel}

\begin{document}
\protected\edef:{\unexpanded\expandafter{:}}

\MakeUppercase{un : un:}

\foreignlanguage{french}{\MakeUppercase{un : un:} }

\makeatletter
\ExpandArgs{Nc}
\protected\edef{bbl@doactive:}{%
    \expandafter\noexpand\csname user@active:\endcsname}%
\ExpandArgs{Nc}
\protected\edef{bbl@active@:}{%
    \noexpand\active@prefix\noexpand:%
    \expandafter\noexpand\csname active@char:\endcsname}%
\ExpandArgs{Nc}
\protected\edef{bbl@normal@:}{%
    \noexpand\active@prefix\noexpand:%
    \expandafter\noexpand\csname normal@char:\endcsname}%

\MakeUppercase{un : un:}

\foreignlanguage{french}{\MakeUppercase{un : un:}}

\end{document}
jbezos commented 1 year ago

Nice try, but it breaks other things (tlb-natbib.lvt won’t pass, \cite[xxx]{Knuth:TB} raising an error because an expansion inside a \csname).

josephwright commented 1 year ago

@jbezos As I suspected, that can be addressed using \ifincsname, in the same way we have for active UTF-8 chars. Adding as a patch is a bit ugly:

\makeatletter
\protected\edef:{\noexpand\ifincsname\string:\noexpand\expandafter\noexpand\@gobble\noexpand\else\noexpand\expandafter\noexpand\@firstofone\noexpand\fi{\unexpanded\expandafter{:}}}
\makeatother

but the basic idea is sound: the first thing to hit is an \ifincsname, then the 'real' payload, which can only be accessed in typesetting (as \protected will have dealt with an \edef).

jbezos commented 1 year ago

@u-fischer @josephwright

Before taking the final step, there is a problem that precludes me from doing it. When there is a shorthand at the beginning of a \cite, an error is raised even with \protected, because there is a \@for executing in turn an \expandafter.

BTW, to my surprise, expansion is not prevented with \expandafter even in \protected contexts. It’s not the case here, but when doing tests it has caught my attention. Interestingly, this makes the old good \protect better in some (few) contexts.

car222222 commented 1 year ago

There may be only a few such contexts, but they are often very important. Are there any places where the "new method", \protected is essential, or even just superior?

FrankMittelbach commented 1 year ago

The way I see it \protected\def\cmd is superior in the sense that you always have a single token before and after and not 2 in cases like \protected@edef, thus once you have passed something through such a protected expansion you don't quite know how many tokens to jump (and of course it does work in \edef directly). On the other hand the classical LaTeX method of using \protect \cmd<space> has the advantage that there is a way to control what happens in any situation by redefining \protect appropriately. In my opinion \protectedis really missing that control possibility. If that would be added then it would be generally superior. Without it I'm still not that convinced (not as much as @josephwright anyway :-) ) that it is really an improvement.

jbezos commented 1 year ago

The way I see it \protected\def\cmd is superior in the sense that you always have a single token before and after and not 2 in cases like \protected@edef,

Not always. Actually, babel shorthands are protected with the classical \protect mechanism in a way the character is again that single character (this mechanism is used in a couple of macros, too, namely \babelhyphen and \babelshorthand.) This is essential in some combinations like "", which in a \protected@edef are left unchanged.

In my opinion \protectedis really missing that control possibility.

Agreed.

FrankMittelbach commented 1 year ago

Not always. Actually, babel shorthands are protected with the classical \protect mechanism in a way the character is again that single character (this mechanism is used in a couple of macros, too, namely \babelhyphen and \babelshorthand.) This is essential in some combinations like "", which in a \protected@edef are left unchanged.

correct Javier, that is really method three (and also used for utf8 chars), I was just replying to Chris's question about when \protected is or is not superior to LaTeX's classical \DeclareRobustCommand

u-fischer commented 1 year ago

I agree that for "real" shorthands like the german " the classical protection can be more suitable as they are essentially commands acting on an argument. But the french colon is imho not such a shorthand, it is more like a ä where you want a certain output.

car222222 commented 1 year ago

The treatment of : in standard french typesetting is definitely substantially different from many (most?) babel-style shorthands. It is not really a shorthand but a layout rule/preference, like inserting a rather large kern!

jbezos commented 1 year ago

It seems my ‘BTW’ has opened a debate about \protect vs. \protected, but this wasn’t my main concern, but just a technicality about the expansion of shorthands in a particular case (\cite) when \protected is applied to them (it doesn't matter if it's " or :).

Just consider a ‘shorthand’ as an active character, usually taking an argument, with a meaning defined by the language or the user. In my tests, applying \protected seems to work in most typical cases, even when a shorthand currently expands to the character with catcode 12 (which is what \@safe@activestrue forces). There are, however, exceptions — an example is natbib, which is assuming the catcode is changed to 12.

Before taking the final step in this change, I’d like to fix the loose ends, and the behavior with \cite is one of them. If this particular issue is sorted out without any patch in the babel side, very likely I can apply \protected to shorthands by default and get rid of the \@safe@actives... mechanism in most cases. Of course, it remains to solve the issues with natbib and others, but right now I’m focused in the default definitions.

u-fischer commented 1 year ago

because there is a \@for executing in turn an \expandafter.

I can not reproduce a problem with \@for. natbib errors for me with a cite starting with an Umlaut because of an \edef in one of its command:

\documentclass{article}
\usepackage{natbib}
\usepackage{etoolbox}
\makeatletter
%fix natbib
\patchcmd\NAT@cite@list@append{\edef}{\protected@edef}{}{\fail}
\makeatother
\bibliographystyle{plainnat}
\begin{document}
\cite{äöü}, 
\bibliography{test}

\end{document}

If that fix your error too, we should fix natbib. If you get another error an example would be good ...

jbezos commented 1 year ago

With current babel there is nothing wrong in natbib (at least wrt to shorthands). It first pre-expands the string with \@safe@activestrue, and the converted result, with catcodes 12, is used later. This strategy can be found in other packages and it’s legitimate. \protected effectively makes \@safe@activestrue no-op.

u-fischer commented 1 year ago

sure, but umlauts explode, so there is something wrong with citation keys in natbib independantly from babel. But you were right in the core it is the \@for that is the problem, it leaves (for the umlaut) a \UTFviii@two@octets and the protected@edef then only prevents that this gives an error.

jbezos commented 1 year ago

This is my nth attempt to protect shorthands with \protected to no avail, so I’m coming to the conclusion I was somewhat optimistic about its feasibility. I’ve decided to put it on one side for some time, so don’t expect too many changes in this regard in the foreseeable future.