Closed jbezos closed 7 months ago
The plan is to improve support for language tuning, but I've not seen anywhere a discussion about mediaeval Latin requiring such a tuning so that wouldn't help at the moment. I assume the requirements are documented somewhere?
@josephwright Please, give me a pointer explaining how a user can add their own rules, so that I can fix the babel
settings with its macro \SetCase
.
@jbezos There's no public interface at the moment, and the plan wasn't to add one - when we looked at the CLDR, there were only about half a dozen languages needing tuning, and most of them needed somewhere tricky look-ahead to do properly. The change needed here seems to be pretty trivial, so I can show how to do it, but until we extend \MakeUppercase
/\MakeLowercase
to use the BCP-47 data, it's also going to need a change there ...
I'll post a suggested fix as soon as I can.
(BTW, this one doesn't seem to be in the CLDR?)
@josephwright Not even in the BCP 47 as such. Its tag in babel
is la-x-medieval
. Don’t consider the CLDR a closed list.
@jbezos Don't worry, that's not the aim - it's more as a guide to rough number of tunings needed (i.e. a few dozen at most, so coverable by the team here). We already have a couple that are not in the CLDR, and I'm happy to add this one. As you've already got a tag that's easy. I have a few minutes - expect a checkin to expl3
plus a suggestion here shortly.
Just in case: remember the x
values are stored in the key extension.x.tag.bcp47
(retrievable with \localeinfo
and \BCPdata
).
The change to \MakeUppercase
, etc., is pretty trivial. For example, for uppercase
\documentclass{article}
\usepackage[,turkish]{babel}
\ExplSyntaxOn
\cs_gset_protected:cpn { MakeUppercase~ }
{
\exp_args:Ne \text_uppercase:nn
{ \localeinfo*{tag.bcp47} }
}
\ExplSyntaxOff
\begin{document}
\MakeUppercase{ir}
\end{document}
(same pattern for lowercase, etc.). This uses the current babel
language to select case changing tuning, and if there is no tuning just gives the Unicode standard, so should be a safe change. (I'm planning much the same for the next kernel update, but likely with a bit more complexity to allow for overrides.)
Working on the Latin part now
Fully working example I hope
\documentclass{article}
\usepackage[english]{babel}
\babelprovide[import]{medievallatin}
\ExplSyntaxOn
\cs_gset_protected:cpn { MakeUppercase~ }
{
\exp_args:Ne \text_uppercase:nn
{ \localeinfo*{tag.bcp47} }
}
\cs_gset_protected:cpn { MakeLowercase~ }
{
\exp_args:Ne \text_lowercase:nn
{ \localeinfo*{tag.bcp47} }
}
\cs_new:cpn { __text_change_case_lower_la-x-medieval:nnnN } latex3/latex2e#1#2#3#4
{
\int_compare:nNnTF { `#4 } = { `V }
{
\__text_change_case_store:e
{
\char_generate:nn { `u } { \__text_char_catcode:N latex3/latex2e#4 }
}
\use:c { __text_change_case_char_next_ latex3/latex2e#2 :nn }
{#2} {#3}
}
{ \__text_change_case_char:nnnN {#1} {#2} {#3} latex3/latex2e#4 }
}
\cs_new:cpn { __text_change_case_upper_la-x-medieval:nnnN } latex3/latex2e#1#2#3#4
{
\int_compare:nNnTF { `#4 } = { `u }
{
\__text_change_case_store:e
{
\char_generate:nn { `V } { \__text_char_catcode:N latex3/latex2e#4 }
}
\use:c { __text_change_case_char_next_ latex3/latex2e#2 :nn }
{#2} {#3}
}
{ \__text_change_case_char:nnnN {#1} {#2} {#3} latex3/latex2e#4 }
}
\ExplSyntaxOff
\begin{document}
\selectlanguage{medievallatin}
\MakeUppercase{lupus}
\MakeLowercase{LVPVS}
\end{document}
If this looks right, I can adjust the expl3
case changer later today so only the \MakeUppercase
and \MakeLowercase
changes are needed at the babel
end (and that only for a little while).
@josephwright It works in the sense the result is correct, but actually my main concern is the fact the mechanism provided by babel
to localize upper- and lowercasing has stopped working altogether, and would like to fix it.
@jbezos I see that and I'm thinking of longer-term solutions - I was trying to get a fix sorted for the immediate issue first.
I've been thinking a bit about how to manage the rather complex data needed to do Unicode-compliant case changing reliably with all engines. At the moment, the new case changer was really aimed at Unicode engines only, and 8-bit support is a bit patchy. More importantly for the current case, the data are not all in one place - and that makes life tricky to offer easy support for babel
. I've been planning to look again at that anyway: we need 'full' UTF-8 support even for 8-bit engines in many situations, and overall it's therefore best to got 'UTF-8 first' and code everything expecting to do the full range even in pdfTeX. For case changing, that means having the 1:1 mapping data somewhere (I have plans), plus standardising how the more complex data is handled.
What I can do when I address that is make sure there is an interface babel
can use, at least for the 1:n mappings (the look-ahead ones perhaps less likely). I'll need to run past the rest of the team, and the data storage needs a bit of work yet. So I think I might need a week or two to address this properly: I hope that is OK. A LaTeX2e name would need to be agreed, so for the present it might be expl3
-only: something like \char_case_mapping:nnn { <type> } { <input> } { <output> }
taking two Unicode codepoints. (@FrankMittelbach , @u-fischer, @davidcarlisle might all have some views here, at least.)
(Perhaps <output>
might be multiple codepoints - XXXX YYYY ZZZZ
in the style of UnicodeData.txt
.)
\MakeUppercase{\today}
is broken, too, but I think I’ve found a temporary hack (by restoring partially the old code while preserving the new one, only when \SetCase
is used by a language). Testing right now.
@jbezos Can you provide an example? \today
is supposed to be expandable ...
@josephwright babel
wraps \today
to make sure the correct language is applied (remember \foreignlanguage
doesn’t switch captions and dates, which means, for example, encodings can be messed up). So it’s partially protected internally. This means a little hack is necessary for the date to be uppercased if necessary. I’d say this a purely babel
issue.
@jbezos Indeed: I think the 'equivalent' mechanism I added for the case changer should work for this.
Please see this issue: https://github.com/latex3/babel/issues/193.
It appears that babel is redefining \MakeUppercase in a way that requires \label to be \protect'd.
Are there plans on making \MakeTitlecase
also support this?
Compare for example
\documentclass{article}
\usepackage[english]{babel}
\babelprovide[import]{medievallatin}
\begin{document}
\selectlanguage{medievallatin}
\MakeUppercase{ups VPS}
\MakeLowercase{ups VPS}
\MakeTitlecase{ups VPS}
\end{document}
and
\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage[turkish, shorthands=:!]{babel}
\begin{document}
\MakeUppercase{i\c{c}inde}
\MakeTitlecase{i\c{c}inde}
\end{document}
Yes, I have plans for the kernel. I will be discussing with Javier hopefully next week.
\MakeUppercase{\today}
again broken with the latest LaTeX 😣.
Just for reference, I copy here a MWE related to the bug with the dot in Turkish (https://github.com/plk/biblatex/issues/1244#issuecomment-1250020285), but I’m not sure it’s related to the changes in the LaTeX kernel (and works for me):
\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage[turkish, shorthands=:!]{babel}
\begin{document}
\MakeUppercase{i\c{c}inde}
\MakeUppercase{\.{i}\c{c}\.{i}nde}
\end{document}
\MakeUppercase{\today}
what is broken? This here works fine for me
\documentclass{article}
\usepackage[ngerman]{babel}
\begin{document}
\MakeUppercase{\today}
\end{document}
Here is the test file I’m using right now:
\documentclass{article}
\usepackage[classiclatin, english, spanish, turkish, provide*=*]{babel}
\begin{document}
\foreignlanguage[date]{classiclatin}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}
\foreignlanguage[date]{english}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}
\foreignlanguage[date]{spanish}{%
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}}
\MakeUppercase{\today---iuíúv}
\MakeLowercase{\today---IUÍÚV}
\end{document}
hm, so the \today definition if provide
is used to load the language is the problem. But that is not restricted to casing, hyperref has no chance either:
\documentclass{article}
\usepackage[english, provide*=*]{babel}
\usepackage{hyperref}
\begin{document}
\section{\today}
\end{document}
gives in the bookmarks
OK, so we need a version of \today
that is expandable, at least for 'string contexts'. I'll look at this.
Should we have a new issue for re-working \today
?
guess so
Language-dependent macros must be expanded after language selectors (which are protected). Consider the following line with an expandable \today
, assuming \usepackage[bulgarian,danish]{babel}
:
\section{\foreignlanguage{bulgarian}{\today}}
The aux
file contains something like:
\foreignlanguage {bulgarian}{22.~november 2022}
which is clearly wrong, even if no error is raised.
The issue with hyperref
must be fixed, of course (a version of \today
that is expandable, as suggested, is an option), but it’s another issue and the point here is \MakeUppercase{\today}
used to work and now it doesn’t.
@jbezos I'm working on it; the cause is non-obvious
Also broken (related to #196):
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[french]{babel}
\begin{document}
\MakeUppercase{Text::}
\end{document}
This now prints ‘TEXT::’ instead of ‘TEXT : :’
Edit And also:
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[esperanto]{babel}
\begin{document}
\MakeUppercase{^h}
\end{document}
The hat is now misplaced.
So, it seems there is a general problem with shorthands.
OK, I understand the \today
issue. \bbl@cased
uses \oe
/\OE
as a marker for uppercasing, so they have to be set equal by \MakeUppercase
irrespective of the casing method used.
\documentclass{article}
\usepackage[classiclatin, english, spanish, turkish, provide*=*]{babel}
\begin{document}
\foreignlanguage[date]{classiclatin}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}
\foreignlanguage[date]{english}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}
\foreignlanguage[date]{spanish}{%
\let\oe\OE
\MakeUppercase{\today}\\
\MakeLowercase{\today}}
\MakeUppercase{\today}\\
\MakeLowercase{\today}
\end{document}
Unless there are objections, I'll fix this from firstaid
for the present, and we can then discuss a longer-term fix elsewhere.
@josephwright Yes, please.
@jbezos May be a few days - perhaps by Tuesday next week
See https://github.com/latex3/latex2e/pull/970, which will fix here if accepted
@josephwright The patched (by firstaid
) patches (in babel
) have been removed altogether, so firstaid
can remove it, too. \today
just tested and working again 😌. Shorthands rendered incorrectly and \SetCase
are pending. The priority is shorthands.
@jbezos Could you provide an example of those issues?
@josephwright For the shorthands, see https://github.com/latex3/babel/issues/189#issuecomment-1325516509.
@josephwright As to \SetCase
, an alternative to customize upper and lower casing is fine. \SetCase
relies on the traditional way of doing things, so I expect it won’t work at all with the new \Make---case
, but at least it can raise an error/warning about what to do.
@josephwright For the shorthands, see #189 (comment).
OK, I see the issue and the source: I need to think about a solution. Basically, the standard babel
definitions 'look' expandable, so the expl3
code turns active-:
into 'other'-:
. That makes sense if we are looking for 'text', but not if there is meant to be 'more stuff'.
@josephwright As to
\SetCase
, an alternative to customize upper and lower casing is fine.\SetCase
relies on the traditional way of doing things, so I expect it won’t work at all with the new\Make---case
, but at least it can raise an error/warning about what to do.
I have ideas of how to address this, but I could do with an idea of whether this all needs to be tracked on a per-locale basis. (A code change is easy enough: I just have to provide a macro-based override for the stored case data.)
@josephwright You may be interested in https://github.com/latex3/babel/issues/196#issuecomment-1322459418 and my answer.
@jbezos Yes, but we do need to find ways more generally of getting 'text results' for the tagging project. In the linked Q, I think something like \ifincsname ...\else<some protected auxiliary>\fi
might work. I'll look at this perhaps tomorrow. However, I'm going to think about a solution to babel
actives similar to the LaTeX 'robust' command one: I should be able to pick up the internal form of expansion and spot that this is 'sort-of protected'.
@josephwright Just in case— The following now fails, too. It have to print A :B:C, but prints A:B:C.
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[french]{babel}
\begin{document}
\catcode`\:=12
\MakeUppercase{A\babelshorthand{:}B:C}
\end{document}
OK, whilst I think a 'long-term' fix is as I've suggested to use a \protected
approach to actives, what I can adjust is to look for \active@prefix
. If that is found during expansion, I can then take an alternative path that doesn't try to expand material. Fix in that area coming up shortly (hopefully today).
@jbezos I have a fix in hand for the issue of actives checked in, so it will be in the next expl3
release. I'll do that shortly: I want to think a little more about the \SetCase
aspect.
Related:
The original issue follows: