Closed gmilde closed 1 year ago
Note that \MakeUppercase[locale=el]{άδικος, κείμενο, ίριδα}
works. The reason why babel
works, AFAIU, is not due to a change in babel
, but due to an incomplete kernel change which is meanwhile using expl3
's \text_uppercase:nn
and friends in the legacy commands and currently only checks for babel
, not (yet) for polyglossia
.
See ltnews36 which states how:
The commands \MakeUppercase, \MakeLowercase
and \MakeTitlecase now automatically detect the
locale currently in use when babel is loaded. This
allows automatic adjustment of letter mappings where
appropriate. They also accept a leading optional
argument. This accepts a key–value list of control
settings. At present, there is one key available: locale,
which can also be accessed via the alias lang. This is
intended to allow local setting of the language, which
can be done using a BCP-47 descriptor. For example,
this could be used to force Turkish case changing in
otherwise English input
\MakeUppercase[lang = tr]{Ragıp Hulûsi Özdem}
yields RAGIP HULÛSİ ÖZDEM.
Now the question is whether it is planned to also detect the locale when polyglossia
is loaded. If not, we need to roll our own definitions à la:
% Roll a locale-aware \MakeUppercase version that handles diacritics correctly (#581)
% This employs the expl3 case changer
\ExplSyntaxOn
\cs_gset:Npn \MakeXPGGreekUppercase #1 { \text_uppercase:nn { el } {#1} }
\ExplSyntaxOff
% Save original \MakeUppercase definition
\let\xpg@save@MakeUppercase\MakeUppercase
\def\blockextras@greek{%
\let\MakeUppercase\MakeXPGGreekUppercase%
}
\def\noextras@greek{%
% restore original \MakeUppercase definition
\let\MakeUppercase\xpg@save@MakeUppercase%
}
Of course I would very much prefer if we could avoid all the necessary ad hoc definitions in our gloss files. @josephwright are there serious reasons for the neglection of polyglossia or is this planned for later anyway?
BTW @josephwright it looks like the following addition (ensuing the respective babel
test in the kernel) will do:
\@ifpackageloaded { polyglossia }
{
\@ifpackagelater { polyglossia } { 2020-01-29 }
{
\cs_gset_protected:Npn \@@text@case@aux@
{
\str_set:Nx \reserved@a
{ \csuse{bcp47id} }
}
}
{ }
}
{ }
(I'm using \csuse{bcp47id}
here since \languageid
is not expandable. We can also provide an expandable version, but it does not strike me necessary here).
@jspitz
It would be better if polyglossia would provide an expandable \localeinfo* { tag.bcp47 }
too, so that the kernel doesn't have to test for external packages. Having a common interface shared by both packages would also allow to remove the test for babel and define a dummy in the kernel.
I don't understand why we would need to emulate the babel syntax (which is newer than polyglossia's)
I don't understand why we would need to emulate the babel syntax (which is newer than polyglossia's)
well Joseph asked last year for a common interface for the BCP-data.
But Javier just reminded me that after this discussion babel added another interface. If polyglossia would implement that too, the kernel could use it in the next release.
\documentclass{article}
\usepackage[austrian]{babel}
\begin{document}
(\BCPdata{language}) (\BCPdata{script}) (\BCPdata{region})
\end{document}
Forget my request. I'll fix this in polyglossia
.
if you mean that you want to redefine \MakeUppercase: Apart from the fact that I think that patching kernel commands like this isn't a good idea, be aware that hyperref relies for the bookmarks on an expandable interface to the bcp tag too (currently it still uses \localeinfo, but I will switch soon to \BCPtag too).
Fixed downstream.
if you mean that you want to redefine \MakeUppercase: Apart from the fact that I think that patching kernel commands like this isn't a good idea, be aware that hyperref relies for the bookmarks on an expandable interface to the bcp tag too (currently it still uses \localeinfo, but I will switch soon to \BCPtag too).
\csuse{bcp47id}
is expandable and works well for the hyperref
case. But since you seem to be unwilling to support everything else than babel
syntax, I am going to patch into \MakeUppercaseUnsupportedInPdfStrings
and \MakeLowercaseUnsupportedInPdfStrings
as well.
Of course I would very much prefer if we could avoid all the necessary ad hoc definitions in our gloss files. @josephwright are there serious reasons for the neglection of polyglossia or is this planned for later anyway?
There's no intention to neglect polyglossia
, but to explain the current situation, the background as I see it.
The change of approach to case changing brought in the idea of locale to a kernel function for the first time. This raised the opportunity to have a fixed kernel definition and to then pick up the locale set by packages. However, as there is currently no generally-agreed kernel interface for the the locale data, we couldn't just 'make a change'.
I therefore looked at what babel
and polyglossia
offer in terms of interface for BCP-47 info: I was hoping for a common one that we could use in the kernel. Again, there isn't a current agreed common interface so I looked at what there is in both packages. As has been mentioned already, we ideally need something expandable, or at least returning a 'string' containing the BCP-47 info (or a proxy to it). In babel
, there is a documented interface that does that, but for polyglossia
there is not currently: I know \bcp47id
is available but that's an internal data store as far as I can make out.
Ideally, the kernel shouldn't be checking for any packages at all, so we did have quite a lively discussion about whether adding code that is dependent on babel
is 'acceptable'. We've gone with the current approach as a temporary measure as babel
has a documented interface for recovering BCP-47 data (including decomposition) (and so if 'temporary' sticks we are using something documented).
However, the aim remains to find a way toward an agreed single interface that both babel
and polyglossia
can use to
make the information available. From my point of view, it would be ideal if the two could agree on a common data structure too, such that loading both would mean that setting the locale from either would apply to both, but I suspect at present that is too much of an ask.
With a common interface, the kernel can then avoid having any package dependence at all: we set up the interface in the kernel, babel
and polyglossia
can then re-define it to provide the data in an appropriate way (or if we can agree a common data store, they can populate the data store and the kernel interface could be left alone).
In terms of a common interface, I think many things are possible. One could for example imagine documenting \bcp47id
and arranging for babel
to populate it. Or one could take babel
's \BCPdata{<aspect>}
approach, or ... I guess I would weakly favour the babel
method as it offers decomposition of the data into different aspects, which I suspect will become more useful over time.
What we really wanted to avoid is a set of 'if package loaded' tests in the kernel. The aim of the current temporary situation was to move forward somewhat whilst not simply putting in a set of patches that would be forgotten and bite all of us later. The current thread is in that sense what I was anticipating: coming back to the interface questions and trying to get to a position everyone can work with.
\csuse{bcp47id}
is expandable and works well for thehyperref
case. But since you seem to be unwilling to support everything else thanbabel
syntax,
I'm willing to support whatever syntax, but as Joseph wrote: we need one syntax that in the long term is not dependant on package tests. The need to query the locale will appear also in other places, e.g. in tagging structures where one sometimes have to set the language too, and it won't do if the kernel or hyperref has to plant everywhere tests about special package code.
I am going to patch into \MakeUppercaseUnsupportedInPdfStrings and \MakeLowercaseUnsupportedInPdfStrings as well.
Well I can't prevent you to go this way, but I doubt that these commands will stay forever in hyperref.
@josephwright
I guess I would weakly favour the
babel
method as it offers decomposition of the data into different aspects, which I suspect will become more useful over time.
Yes, I agree with this.
I completely agree with "a common interface would be ideal". However, devising such a common interface requires, in my opinion, that all involved parties sit down and talk about how this should look like. At least in the cases where I have been involved, this has not been the case. The babel maintainer did not show any interest in corporation but simply enforced his own approach (without even considering talking back with us). This is not how collaboration works, and my motivation to embark on this ship is consequently low.
The babel maintainer did not show any interest in corporation but simply enforced his own approach (without even considering talking back with us)
\BCPdata
was a suggestion from Joseph, and is not a "babel approach". babel only implemented it after the discussion by mail in may in which you participated too. If you don't like the idea, please suggest something else for the collaboration with the kernel and packages like hyperref.
I have added a very basic (expandable) \BCPdata
command. For the time being, this returns the full tag no matter what argument you pass to it. I suppose this will suffice for the current uses with casing. So instead of checking for a particular babel
version, you probably can simply check whether this command is defined at document begin.
We can add subtags later if required. This requires a bit more work, as polyglossia
currently does only store full tags.
@josephwright @u-fischer @jspitz
babel defines the following 4 fields whose names are (I think) obvious, because they are used in the BCP 47: language
, script
, region
, and variant
(eg, \BCPdata{script}
). For the 3rd the CLDR seems to prefer territory
, and polyglossia uses ‘variant’ with another meaning. Of course, I’m willing to create aliases if necessary.
Extensions are tentatively named extension.t
and extension.u
. The private use area is named extension.x
which is not quite correct. Since these names are documented as tentative in babel, they may be changed.
babel defines the following 4 fields whose names are (I think) obvious, because they are used in the BCP 47:
language
,script
,region
, andvariant
(eg,\BCPdata{script}
). For the 3rd the CLDR seems to preferterritory
, and polyglossia uses ‘variant’ with another meaning. Of course, I’m willing to create aliases if necessary.
I'll go with those as well for polyglossia (in the medium term), and probably extension.x
which is used by latin
. We don't have u
and t
extensions yet.
@jbezos I am wondering whether it would make sense to add
\BCPdata{main.language}
, as sometimes it is useful to know the difference\BCPdata
output the BCP-47 tag as it is recorded in the gloss/ini file (\localeinfo*{tag.bcp47}
in babel
, \languageid{bcp47}
in polyglossia
), e.g. \BCPdata{tag}
or simply with empty argumentI have added a very basic (expandable)
\BCPdata
command. For the time being, this returns the full tag no matter what argument you pass to it. I suppose this will suffice for the current uses with casing. So instead of checking for a particularbabel
version, you probably can simply check whether this command is defined at document begin.We can add subtags later if required. This requires a bit more work, as
polyglossia
currently does only store full tags.
Many thanks. I think we can work with this. Probably at some stage we might want to have \providecommand
in the packages and \newcommand
in the kernel (so no \ifdefined
test is needed), but that is likely a refinement for a future 'pass'.
I will look to adjust the kernel code for the next (2023-06-01) release: does this fit with everyone's thinking?
Many thanks. I think we can work with this. Probably at some stage we might want to have
\providecommand
in the packages and\newcommand
in the kernel (so no\ifdefined
test is needed), but that is likely a refinement for a future 'pass'.
If I got it right, then the kernel can already now make \newcommand\BCPdata[1]{...}
with some suitable default definition (and document what the output should be if packages overwrite the definition).
Many thanks. I think we can work with this. Probably at some stage we might want to have
\providecommand
in the packages and\newcommand
in the kernel (so no\ifdefined
test is needed), but that is likely a refinement for a future 'pass'.If I got it right, then the kernel can already now make
\newcommand\BCPdata[1]{...}
with some suitable default definition (and document what the output should be if packages overwrite the definition).
@jbezos, @jspitz That work for both of you?
- a way to access these info also for the main language (as opposed to the current language), e.g.
\BCPdata{main.language}
, as sometimes it is useful to know the difference- a standardized way to let
\BCPdata
output the BCP-47 tag as it is recorded in the gloss/ini file (\localeinfo*{tag.bcp47}
inbabel
,\languageid{bcp47}
inpolyglossia
), e.g.\BCPdata{tag}
or simply with empty argument
Why not, but \BCPdata
is more a LaTeX command, so I wonder what @josephwright and @u-fischer think. Since it’s not documented (at least in babel
I didn’t documented it because it was essentially tentative and ‘unfinished’), we can still make some changes, or at least leave room for future changes.
If I got it right, then the kernel can already now make
\newcommand\BCPdata[1]{...}
with some suitable default definition (and document what the output should be if packages overwrite the definition).@jbezos, @jspitz That work for both of you?
That’s mostly fine, but I thinking of something like this: we load Polytonic Greek, whose tag is el-polyton
. I want to keep this tag for whatever reason, to correctly identify which Greek variant is in force (in the BCP 47 sense, and in this case in the polyglossia
sense, too, iir). But I also want to activate the iota
rules by default, always, without explicitly passing el-x-iota
. I’m not sure how this can be done.
That’s mostly fine, but I thinking of something like this: we load Polytonic Greek, whose tag is
el-polyton
. I want to keep this tag for whatever reason, to correctly identify which Greek variant is in force (in the BCP 47 sense, and in this case in thepolyglossia
sense, too, iir). But I also want to activate theiota
rules by default, always, without explicitly passingel-x-iota
. I’m not sure how this can be done.
Wouldn't that be el-polyton-x-iota
? My reading of the x
stuff is it can be used to add 'whatever' to the end of the tag. I suspect I may need to do more work if we want multiple x
tag combos, but I think el-polyton-x-iota
should already work: we do decomposition into <lang>-<filler>-x-<extension>
so it should 'just work'.
If it ‘just works’, great (actually, didn’t test).
If it ‘just works’, great (actually, didn’t test).
I did: this does work :)
- a way to access these info also for the main language (as opposed to the current language), e.g.
\BCPdata{main.language}
, as sometimes it is useful to know the difference- a standardized way to let
\BCPdata
output the BCP-47 tag as it is recorded in the gloss/ini file (\localeinfo*{tag.bcp47}
inbabel
,\languageid{bcp47}
inpolyglossia
), e.g.\BCPdata{tag}
or simply with empty argumentWhy not, but
\BCPdata
is more a LaTeX command, so I wonder what @josephwright and @u-fischer think. Since it’s not documented (at least inbabel
I didn’t documented it because it was essentially tentative and ‘unfinished’), we can still make some changes, or at least leave room for future changes.
Should we open a kernel issue to hammer this out now we have broad agreement? The plan would be I think for the kernel to define the command and document the expected semantics, but itself do nothing, then leave it to polyglossia
and babel
to implement whatever parts of the documented API make sense.
If it ‘just works’, great (actually, didn’t test).
I did: this does work :)
But wouldn't you get in \MakeUppercase only el
from \BCPdata{language}?
If it ‘just works’, great (actually, didn’t test).
I did: this does work :)
But wouldn't you get in \MakeUppercase only
el
from \BCPdata{language}?
Er, yes: what I meant is that el-polyton-x-iota
is read by the case code as el
+ x-iota
and the polyton
part is ignored.
@josephwright Making tests. I’m still not sure what to do wrt to the last part of my question:
But I also want to activate the iota rules by default, always, without explicitly passing el-x-iota.
Will the casing macros take the private area into account if the corresponding field is defined?
(btw, I wonder if this discussion should take place in the LaTeX repository.)
But I also want to activate the iota rules by default, always, without explicitly passing el-x-iota.
Will the casing macros take the private area into account if the corresponding field is defined?
I wondered if an indirection would make sense. So that \MakeUppercase
uses say \BCPdata{casing}
, and you could then define this e.g. as a combination of \BCPdata{language}-\BCPdata{whatever)
?
@u-fischer 👍 It’s my preferred option, indeed, which imo will make things easier for other packages (eg, xgreek
) and for users who don’t like babel
or polyglossia
. And document what the casing macros expect (which, in fact, is already done).
(btw, I wonder if this discussion should take place in the LaTeX repository.)
I think that might be best: we are really talking about something separate from the thread here
Many thanks. I think we can work with this. Probably at some stage we might want to have
\providecommand
in the packages and\newcommand
in the kernel (so no\ifdefined
test is needed), but that is likely a refinement for a future 'pass'.If I got it right, then the kernel can already now make
\newcommand\BCPdata[1]{...}
with some suitable default definition (and document what the output should be if packages overwrite the definition).@jbezos, @jspitz That work for both of you?
Works for me.
- a way to access these info also for the main language (as opposed to the current language), e.g.
\BCPdata{main.language}
, as sometimes it is useful to know the difference- a standardized way to let
\BCPdata
output the BCP-47 tag as it is recorded in the gloss/ini file (\localeinfo*{tag.bcp47}
inbabel
,\languageid{bcp47}
inpolyglossia
), e.g.\BCPdata{tag}
or simply with empty argumentWhy not, but
\BCPdata
is more a LaTeX command, so I wonder what @josephwright and @u-fischer think. Since it’s not documented (at least inbabel
I didn’t documented it because it was essentially tentative and ‘unfinished’), we can still make some changes, or at least leave room for future changes.
I think it will be good if at least babel and polyglossia use the same arguments here (and then we both document it).
As there is a kernel change in sight (and hopefully also hyperref
), I've removed the patches. Let's get this right from the start. @josephwright I define \BCPdata
in polyglossia
via \DeclareExpandableDocumentCommand
. This works with your plans, doesn't it?
I've opened https://github.com/latex3/latex2e/issues/1035 as defining the interface really does need to be done at the kernel level.
I suggest closing here at this point as the issue is being handled elsewhere
I'll close as soon as I have implemented \BCPdata
with all keywords (which will take me some time).
Everything is done on polyglossia side now. Some subtags might need adjustment, but in general all data is in, and \BCPdata{casing}
should always return the appropriate tag.
Closing this as "upstream" bug (for the record, this will be fixed with the next LaTeX kernel release, 2023/06).
I have just committed poylglossia 1.61 to CTAN. This includes \BCPdata
.
@u-fischer with the latest hyperref release, the following should work right?
% !TeX TS-program = xelatex
\documentclass{article}
\usepackage{fontspec}
\setmainfont{DejaVu Serif}
\usepackage{polyglossia}
\setmainlanguage[variant=poly]{greek}
\usepackage[bookmarks]{hyperref}
\begin{document}
\section{\MakeUppercase{άδικος, κείμενο, ίριδα}}
άδικος, κείμενο, ίριδα → \MakeUppercase{άδικος, κείμενο, ίριδα}
\end{document}
still get wrong casing in the bookmarks.
Same with
\usepackage[greek.polutoniko]{babel}
No, sorry, babel
works (after an additional xelatex
pass). So is there something wrong with the return value of polyglossia
?
@jspitz your \BCPdata
is not expandable.
\edef\blub{\BCPdata{casing}}\show\blub
gives
> \blub=macro:
->\tl_set:Nn casing{casing}\str_if_in:nnTF {casing}{main.}{\seq_set_split:Nnx \s__seq {.}casing\seq_get_right:NN \s__seq casing\str_set_eq:NN greekgreek}{\str_set_eq:NN greekgreek}\cs_set_nopar:cpx {tmpcasing}{}el.
In babel I get
> \blub=macro:
->el.
hyperref needs expandable commands in the bookmarks (and the kernel needs an expandable \BCPdata
too).
Any advise how to make it expandable is welcome.
I have reworked \BCPdata
at 8a5a2fd8c7e7041a4. It is now expandable. A hotfix release (1.61b) will go to CTAN later today.
Greek letters drop diacritics (eccept dialytika and sub-iota) in UPPERCASE. This is not cared for by the Unicode standard (except with a reference implementation for a Greek-specific upcasing algorithm https://icu.unicode.org/design/case/greek-upper). Polyglossia uses the
\uccode
re-definitions from "xgreek.sty" to achieve this effect. However, since the new implementation in the 2022/06 LaTeX release, these re-definitions are ignored by\MakeUppercase
.Somehow a "locale setting" by the Babel core activates Greek upcasing rules while by default the common upcasing rules from Unicode are used for Greek literals. Unfortunately, there is no documented API that I know of.