latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
125 stars 34 forks source link

Localization of \MakeUppercase and \MakeLowercase broken #189

Closed jbezos closed 7 months ago

jbezos commented 1 year ago

After an update past July (2022) in the LaTeX kernel, several babel features are not compatible with \MakeUppercase and \MakeLowercase any more. These macros have been rewritten to improve their behavior, but now you may find the following issues:

Closed \SetCase is ignored altogether, which affects very few languages. 3.90 introduces an alternative mechanism with \BabelLowercaseMapping and \BabelUppercaseMapping, but it’s still unfinished because of some limitations in the LaTeX kernel that must be still sorted out.Fixed: Shorthands might not work as expected. With french : is badly spaced, and with esperanto the hat (eg, ^h) is misplaced.Fixed: \MakeUppercase{\today} doesn’t uppercase the date.

The first point is ‘unfixable’ in the sense the casing mechanism in LaTeX is quite different, but (a) \SetCase has been modified, to at least deal with macro mapping (the optional argument), and (b) a way to declare case mappings in ini files has been devised. \SetCase was used in 6 ldf files, namely, turkish, lithuanian, latin, dutch, afrikaans, and azerbaijani, but they now work out the box with LaTeX, except the latter, which in old documents relying on the LICR names require the macro mapping, now fixed.

Related:

If after upgrading your system (which is the very first step) you still experience some problems and you are in a hurry, as an emergency solution you can try restoring the old definitions, with the following piece of code in the document body:

\makeatletter
\DeclareRobustCommand{\MakeUppercase}[1]{{%
     \def\i{I}\def\j{J}%
     \def\reserved@a##1##2{\let##1##2\reserved@a}%
     \expandafter\reserved@a\@uclclist\reserved@b{\reserved@b\@gobble}%
     \let\UTF@two@octets@noexpand\@empty
     \let\UTF@three@octets@noexpand\@empty
     \let\UTF@four@octets@noexpand\@empty
     \protected@edef\reserved@a{\uppercase{#1}}%
     \reserved@a
  }}
\DeclareRobustCommand{\MakeLowercase}[1]{{%
     \def\reserved@a##1##2{\let##2##1\reserved@a}%
     \expandafter\reserved@a\@uclclist\reserved@b{\reserved@b\@gobble}%
     \let\UTF@two@octets@noexpand\@empty
     \let\UTF@three@octets@noexpand\@empty
     \let\UTF@four@octets@noexpand\@empty
     \protected@edef\reserved@a{\lowercase{#1}}%
     \reserved@a
  }}
\makeatother

The original issue follows:

With the most recent version of LaTeX, the following minimal example doesn't work any more:

\documentclass{article}

\usepackage[english]{babel}

\babelprovide[import]{medievallatin}

\begin{document}

\selectlanguage{medievallatin}

\MakeUppercase{lupus}

\MakeLowercase{LVPVS}

\end{document}

Now it prints ‘LUPUS’ and ‘lvpvs’, instead of ‘LVPVS’ and ‘lupus’,

josephwright commented 1 year ago

The plan is to improve support for language tuning, but I've not seen anywhere a discussion about mediaeval Latin requiring such a tuning so that wouldn't help at the moment. I assume the requirements are documented somewhere?

jbezos commented 1 year ago

@josephwright Please, give me a pointer explaining how a user can add their own rules, so that I can fix the babel settings with its macro \SetCase.

josephwright commented 1 year ago

@jbezos There's no public interface at the moment, and the plan wasn't to add one - when we looked at the CLDR, there were only about half a dozen languages needing tuning, and most of them needed somewhere tricky look-ahead to do properly. The change needed here seems to be pretty trivial, so I can show how to do it, but until we extend \MakeUppercase/\MakeLowercase to use the BCP-47 data, it's also going to need a change there ...

I'll post a suggested fix as soon as I can.

(BTW, this one doesn't seem to be in the CLDR?)

jbezos commented 1 year ago

@josephwright Not even in the BCP 47 as such. Its tag in babel is la-x-medieval. Don’t consider the CLDR a closed list.

josephwright commented 1 year ago

@jbezos Don't worry, that's not the aim - it's more as a guide to rough number of tunings needed (i.e. a few dozen at most, so coverable by the team here). We already have a couple that are not in the CLDR, and I'm happy to add this one. As you've already got a tag that's easy. I have a few minutes - expect a checkin to expl3 plus a suggestion here shortly.

jbezos commented 1 year ago

Just in case: remember the x values are stored in the key extension.x.tag.bcp47 (retrievable with \localeinfo and \BCPdata).

josephwright commented 1 year ago

The change to \MakeUppercase, etc., is pretty trivial. For example, for uppercase

\documentclass{article}
\usepackage[,turkish]{babel}
\ExplSyntaxOn
\cs_gset_protected:cpn { MakeUppercase~ }
  {
    \exp_args:Ne \text_uppercase:nn
       { \localeinfo*{tag.bcp47} }
  }
\ExplSyntaxOff

\begin{document}
\MakeUppercase{ir}
\end{document}

(same pattern for lowercase, etc.). This uses the current babel language to select case changing tuning, and if there is no tuning just gives the Unicode standard, so should be a safe change. (I'm planning much the same for the next kernel update, but likely with a bit more complexity to allow for overrides.)

Working on the Latin part now

josephwright commented 1 year ago

Fully working example I hope

\documentclass{article}

\usepackage[english]{babel}

\babelprovide[import]{medievallatin}

\ExplSyntaxOn
\cs_gset_protected:cpn { MakeUppercase~ }
  {
    \exp_args:Ne \text_uppercase:nn
       { \localeinfo*{tag.bcp47} }
  }
\cs_gset_protected:cpn { MakeLowercase~ }
  {
    \exp_args:Ne \text_lowercase:nn
       { \localeinfo*{tag.bcp47} }
  }
\cs_new:cpn { __text_change_case_lower_la-x-medieval:nnnN } latex3/latex2e#1#2#3#4
  {
    \int_compare:nNnTF { `#4 } = { `V }
      {
        \__text_change_case_store:e
          {
            \char_generate:nn { `u } { \__text_char_catcode:N latex3/latex2e#4 }
          }
        \use:c { __text_change_case_char_next_ latex3/latex2e#2 :nn }
          {#2} {#3}
      }
      { \__text_change_case_char:nnnN {#1} {#2} {#3} latex3/latex2e#4 }
  }
\cs_new:cpn { __text_change_case_upper_la-x-medieval:nnnN } latex3/latex2e#1#2#3#4
  {
    \int_compare:nNnTF { `#4 } = { `u }
      {
        \__text_change_case_store:e
          {
            \char_generate:nn { `V } { \__text_char_catcode:N latex3/latex2e#4 }
          }
        \use:c { __text_change_case_char_next_ latex3/latex2e#2 :nn }
          {#2} {#3}
      }
      { \__text_change_case_char:nnnN {#1} {#2} {#3} latex3/latex2e#4 }
  }
\ExplSyntaxOff

\begin{document}
\selectlanguage{medievallatin}
\MakeUppercase{lupus}

\MakeLowercase{LVPVS}

\end{document}

If this looks right, I can adjust the expl3 case changer later today so only the \MakeUppercase and \MakeLowercase changes are needed at the babel end (and that only for a little while).

jbezos commented 1 year ago

@josephwright It works in the sense the result is correct, but actually my main concern is the fact the mechanism provided by babel to localize upper- and lowercasing has stopped working altogether, and would like to fix it.

josephwright commented 1 year ago

@jbezos I see that and I'm thinking of longer-term solutions - I was trying to get a fix sorted for the immediate issue first.

I've been thinking a bit about how to manage the rather complex data needed to do Unicode-compliant case changing reliably with all engines. At the moment, the new case changer was really aimed at Unicode engines only, and 8-bit support is a bit patchy. More importantly for the current case, the data are not all in one place - and that makes life tricky to offer easy support for babel. I've been planning to look again at that anyway: we need 'full' UTF-8 support even for 8-bit engines in many situations, and overall it's therefore best to got 'UTF-8 first' and code everything expecting to do the full range even in pdfTeX. For case changing, that means having the 1:1 mapping data somewhere (I have plans), plus standardising how the more complex data is handled.

What I can do when I address that is make sure there is an interface babel can use, at least for the 1:n mappings (the look-ahead ones perhaps less likely). I'll need to run past the rest of the team, and the data storage needs a bit of work yet. So I think I might need a week or two to address this properly: I hope that is OK. A LaTeX2e name would need to be agreed, so for the present it might be expl3-only: something like \char_case_mapping:nnn { <type> } { <input> } { <output> } taking two Unicode codepoints. (@FrankMittelbach , @u-fischer, @davidcarlisle might all have some views here, at least.)

josephwright commented 1 year ago

(Perhaps <output> might be multiple codepoints - XXXX YYYY ZZZZ in the style of UnicodeData.txt.)

jbezos commented 1 year ago

\MakeUppercase{\today} is broken, too, but I think I’ve found a temporary hack (by restoring partially the old code while preserving the new one, only when \SetCase is used by a language). Testing right now.

josephwright commented 1 year ago

@jbezos Can you provide an example? \today is supposed to be expandable ...

jbezos commented 1 year ago

@josephwright babel wraps \today to make sure the correct language is applied (remember \foreignlanguage doesn’t switch captions and dates, which means, for example, encodings can be messed up). So it’s partially protected internally. This means a little hack is necessary for the date to be uppercased if necessary. I’d say this a purely babel issue.

josephwright commented 1 year ago

@jbezos Indeed: I think the 'equivalent' mechanism I added for the case changer should work for this.

John02139 commented 1 year ago

Please see this issue: https://github.com/latex3/babel/issues/193.

It appears that babel is redefining \MakeUppercase in a way that requires \label to be \protect'd.

moewew commented 1 year ago

Are there plans on making \MakeTitlecase also support this?

Compare for example

\documentclass{article}

\usepackage[english]{babel}

\babelprovide[import]{medievallatin}

\begin{document}

\selectlanguage{medievallatin}

\MakeUppercase{ups VPS}

\MakeLowercase{ups VPS}

\MakeTitlecase{ups VPS}

\end{document}

and

\documentclass[a4paper]{article}

\usepackage[T1]{fontenc}
\usepackage[turkish, shorthands=:!]{babel}

\begin{document}

\MakeUppercase{i\c{c}inde}

\MakeTitlecase{i\c{c}inde}

\end{document}
josephwright commented 1 year ago

Yes, I have plans for the kernel. I will be discussing with Javier hopefully next week.

jbezos commented 1 year ago

\MakeUppercase{\today} again broken with the latest LaTeX 😣.

jbezos commented 1 year ago

Just for reference, I copy here a MWE related to the bug with the dot in Turkish (https://github.com/plk/biblatex/issues/1244#issuecomment-1250020285), but I’m not sure it’s related to the changes in the LaTeX kernel (and works for me):

\documentclass[a4paper]{article}

\usepackage[T1]{fontenc}
\usepackage[turkish, shorthands=:!]{babel}

\begin{document}

\MakeUppercase{i\c{c}inde}

\MakeUppercase{\.{i}\c{c}\.{i}nde}

\end{document}
jbezos commented 1 year ago

Related: https://github.com/latex3/babel/issues/196

u-fischer commented 1 year ago

\MakeUppercase{\today}

what is broken? This here works fine for me

\documentclass{article}
\usepackage[ngerman]{babel}

\begin{document}
\MakeUppercase{\today}
\end{document}
jbezos commented 1 year ago

Here is the test file I’m using right now:

\documentclass{article}

\usepackage[classiclatin, english, spanish, turkish, provide*=*]{babel}

\begin{document}

\foreignlanguage[date]{classiclatin}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}

\foreignlanguage[date]{english}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}

\foreignlanguage[date]{spanish}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}

\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}

\end{document}
u-fischer commented 1 year ago

hm, so the \today definition if provide is used to load the language is the problem. But that is not restricted to casing, hyperref has no chance either:

\documentclass{article}

\usepackage[english, provide*=*]{babel}
\usepackage{hyperref}
\begin{document}
\section{\today}
\end{document}

gives in the bookmarks

image

josephwright commented 1 year ago

OK, so we need a version of \today that is expandable, at least for 'string contexts'. I'll look at this.

josephwright commented 1 year ago

Should we have a new issue for re-working \today?

FrankMittelbach commented 1 year ago

guess so

jbezos commented 1 year ago

Language-dependent macros must be expanded after language selectors (which are protected). Consider the following line with an expandable \today, assuming \usepackage[bulgarian,danish]{babel}:

\section{\foreignlanguage{bulgarian}{\today}}

The aux file contains something like:

\foreignlanguage  {bulgarian}{22.~november 2022}

which is clearly wrong, even if no error is raised.

The issue with hyperref must be fixed, of course (a version of \today that is expandable, as suggested, is an option), but it’s another issue and the point here is \MakeUppercase{\today} used to work and now it doesn’t.

josephwright commented 1 year ago

@jbezos I'm working on it; the cause is non-obvious

jbezos commented 1 year ago

Also broken (related to #196):

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[french]{babel}
\begin{document}
\MakeUppercase{Text::}
\end{document}

This now prints ‘TEXT::’ instead of ‘TEXT : :’

Edit And also:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[esperanto]{babel}
\begin{document}
\MakeUppercase{^h}
\end{document}

The hat is now misplaced.

So, it seems there is a general problem with shorthands.

josephwright commented 1 year ago

OK, I understand the \today issue. \bbl@cased uses \oe/\OE as a marker for uppercasing, so they have to be set equal by \MakeUppercase irrespective of the casing method used.

josephwright commented 1 year ago
\documentclass{article}
\usepackage[classiclatin, english, spanish, turkish, provide*=*]{babel}

\begin{document}

\foreignlanguage[date]{classiclatin}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}

\foreignlanguage[date]{english}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}

\foreignlanguage[date]{spanish}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}

\MakeUppercase{\today}\\
\MakeLowercase{\today}

\end{document}
josephwright commented 1 year ago

Unless there are objections, I'll fix this from firstaid for the present, and we can then discuss a longer-term fix elsewhere.

jbezos commented 1 year ago

@josephwright Yes, please.

josephwright commented 1 year ago

@jbezos May be a few days - perhaps by Tuesday next week

josephwright commented 1 year ago

See https://github.com/latex3/latex2e/pull/970, which will fix here if accepted

jbezos commented 1 year ago

@josephwright The patched (by firstaid) patches (in babel) have been removed altogether, so firstaid can remove it, too. \today just tested and working again 😌. Shorthands rendered incorrectly and \SetCase are pending. The priority is shorthands.

josephwright commented 1 year ago

@jbezos Could you provide an example of those issues?

jbezos commented 1 year ago

@josephwright For the shorthands, see https://github.com/latex3/babel/issues/189#issuecomment-1325516509.

jbezos commented 1 year ago

@josephwright As to \SetCase, an alternative to customize upper and lower casing is fine. \SetCase relies on the traditional way of doing things, so I expect it won’t work at all with the new \Make---case, but at least it can raise an error/warning about what to do.

josephwright commented 1 year ago

@josephwright For the shorthands, see #189 (comment).

OK, I see the issue and the source: I need to think about a solution. Basically, the standard babel definitions 'look' expandable, so the expl3 code turns active-: into 'other'-:. That makes sense if we are looking for 'text', but not if there is meant to be 'more stuff'.

josephwright commented 1 year ago

@josephwright As to \SetCase, an alternative to customize upper and lower casing is fine. \SetCase relies on the traditional way of doing things, so I expect it won’t work at all with the new \Make---case, but at least it can raise an error/warning about what to do.

I have ideas of how to address this, but I could do with an idea of whether this all needs to be tracked on a per-locale basis. (A code change is easy enough: I just have to provide a macro-based override for the stored case data.)

jbezos commented 1 year ago

@josephwright You may be interested in https://github.com/latex3/babel/issues/196#issuecomment-1322459418 and my answer.

josephwright commented 1 year ago

@jbezos Yes, but we do need to find ways more generally of getting 'text results' for the tagging project. In the linked Q, I think something like \ifincsname ...\else<some protected auxiliary>\fi might work. I'll look at this perhaps tomorrow. However, I'm going to think about a solution to babel actives similar to the LaTeX 'robust' command one: I should be able to pick up the internal form of expansion and spot that this is 'sort-of protected'.

jbezos commented 1 year ago

@josephwright Just in case— The following now fails, too. It have to print A :B:C, but prints A:B:C.

\documentclass{article}

\usepackage[T1]{fontenc}

\usepackage[french]{babel}

\begin{document}

\catcode`\:=12

\MakeUppercase{A\babelshorthand{:}B:C}

\end{document}
josephwright commented 1 year ago

OK, whilst I think a 'long-term' fix is as I've suggested to use a \protected approach to actives, what I can adjust is to look for \active@prefix. If that is found during expansion, I can then take an alternative path that doesn't try to expand material. Fix in that area coming up shortly (hopefully today).

josephwright commented 1 year ago

@jbezos I have a fix in hand for the issue of actives checked in, so it will be in the next expl3 release. I'll do that shortly: I want to think a little more about the \SetCase aspect.