Closed u-fischer closed 2 years ago
The fact german
is not (present) German, while ngerman
is seems to me an anomaly. I know there are ‘historical reasons’, but I wonder if preserving these conflicting names is the best option. IIR, German is the only case where the babel names overlap with those in the CLDR (I mean, the same name for different things), and I would like to respect the latter, which are kind of standard. Besides german
, there is swissgerman
, which is de-CH
in babel, but gsw
in the CLDR, because it’s a different language/dialect. Maybe a switch for these cases will do the job (something like names=cldr
), but this wouldn’t be very elegant and can bring even more confusion.
@jspitz Any thoughts?
personally I would have no problem if german/ngerman both would refer to present German, and the older is referred to say german-1901
.
But the behaviour of ldf and ini should be consistent. It is not good if the german.ldf loads hyphenation patterns for the old spelling while the ini claims it is new.
I think for backwards compatibility reasons, german
should link to de_DE-1901
, and ngerman
to de_DE
.
Thinking aloud — an alternate set of ini
files (let’s call them, say, de-x-babel
) which takes precedence if the corresponding ldf
files have been already loaded.
Thinking aloud — an alternate set of
ini
files (let’s call them, say,de-x-babel
) which takes precedence if the correspondingldf
files have been already loaded.
I don't think that it would be a good idea if german
would sometimes mean old and sometimes new spellings and to make the situation even more complex.
@jspitz
I think for backwards compatibility reasons,
german
should link tode_DE-1901
, andngerman
tode_DE
.
Imho backward compatibility got lost when polyglossia decided to use german
as main language name. Since then "german" can mean the one or the other spelling variant.
I also think that for many new users (which are too young to know about the spelling discussion) it is confusing to have to use ngerman
as option for babel instead of the natural german
. We quite often see examples in questions. I would suggest to really consider to break compatibility here and to clean up the situation.
I don't think polyglossia breaks backwards compatibility. Polyglossia names are not babel names. A break of backwards compatibility is if old documents suddenly produce unexpected output. I suppose many thousands of documents out there who use babel's german
would be affected (most of my own documents would in fact break, as I often use babel's german
when quoting older German texts).
If you want to do a really sensible change, you should keep the old babel names as aliases in the background, and officially switch to the less ambiguous BCP-47 identifiers to select language varieties.
(names such as german
-- and even more so austrian
-- are highly ambiguous anyway; the fact alone that german
is linked to the German standard variety and not Swiss or Austrian is disputable)
Many languages have evolved over time, even with significant changes, and the original name has been preserved: French, Spanish, and Russian are examples. For me, the most confusing point here is german
isn’t actually the option to be used for German.
BTW, I’ve found another case of overlapping names: serbian
in babel
is sr-Latn
instead of sr(-Cyrl)
. The ‘real’ Serbian is serbianc
(I'll try to locate the author to see how this can be changed).
As to BCP47, they weren’t devised for user interfaces, but as unique identifiers at a lower level. And we are dealing with the name to be selected by the user. IMO, this isn’t the way to go (and in fact I think it doesn’t solve the problem at all, because after all the name in the IANA registry and in the CLDR for de
is still German).
As to BCP47, they weren’t devised for user interfaces, but as unique identifiers at a lower level. And we are dealing with the name to be selected by the user. IMO, this isn’t the way to go (and in fact I think it doesn’t solve the problem at all, because after all the name in the IANA registry and in the CLDR for
de
is still German).
Not quite the same. de
== German means all varieties of German. This includes de-1901
, de-AT
, de-CH
, de-Latf-1901
etc. Babel's ngerman
is only a subset of de
, namely de-DE-1996
(as babel's german
is a subset, namely de-DE-1901
).
The language name is ambiguous,
(names such as german -- and even more so austrian -- are highly ambiguous anyway; the fact alone that german is linked to the German standard variety and not Swiss or Austrian is disputable)
yes. But as LaTeX can't use all at the same time one has to make a choice which variant/spelling is meant with german
on the user level and how to name or select the other ones.
Polyglossia choose as default for german
variant=german
and spelling=new
so de-DE-1996
, so did the babel ini-files, and the question is if one can unify that again with babel-german.
I think one can't without breaking backwards compatibility. I think this outweights having identical language names (which is only identity on the surface anyway).
I think it is fine that babel
assumes de
is de-DE-1996
(polyglossia
does this as well). What is wrong is that de
, then, does not set region.tag.bcp47 = DE
and variant.tag.bcp47 = 1996
.
In other words, if the language is set via tag, setting ngerman
if "de" is input is OK (as Ulrike says, some variant has to be selected). BUT: this variant should then identify itself precisely if the BCP47 tag is queried. In that case, de
is underspecified, the region and variant tags need to be reported as well.
My suggestion for babel would be to set up ini files for de-DE
. Then you can direct babel-de.ini
to that, and things become much more clear.
This is the way the CLDR works. It’s a standard widely used and I see no real reason to break its rules. The locale de
is
strictly the same as de-DE
, with an exception: the latter sets the region to DE
, while the former doesn’t. The fact
de
is considered the equivalent, in principle, of de-Latn-DE
is confirmed here:
https://unicode-org.github.io/cldr-staging/charts/38/supplemental/likely_subtags.html
This criterion is applied to all languages in the CLDR and the goal must be the removal of incompatibilities and inconsistencies, and not the addition of new ones (especially if the inconsistency only serves to “fix” another inconsistency).
Anyway, in babel
the ‘likely’ tag is also available, too (in de
is, of course, de-Latn-DE
).
Added a section on language naming in https://latex3.github.io/babel/news/whats-new-in-babel-3.75.html.
babel-german.tex
now points to babel-de-1901.ini
if the hyphenation patterns for \l@german
are de-1901
and the ldf file has been loaded. There is a similar trick for swissgerman
. Although now there are some inconsistencies, I think this hack is the most transparent solution for users, requiring no actions from them: https://latex3.github.io/babel/news/whats-new-in-babel-3.77.html#german-and-ini-files
german
is the name of the language german with the pre-1996 spelling rules. So in the following example I expected it to importbabel-de-1901.ini
but instead I got onlybabel-de.ini
.