latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
125 stars 34 forks source link

german doesn't import de-1901 as locale #179

Closed u-fischer closed 2 years ago

u-fischer commented 2 years ago

german is the name of the language german with the pre-1996 spelling rules. So in the following example I expected it to import babel-de-1901.ini but instead I got only babel-de.ini.

\documentclass{article}
\usepackage[german]{babel}

\BabelEnsureInfo
%\babelprovide[main]{german-traditional} % this works

\begin{document}

\getlocaleproperty*\mytest{\languagename}{identification/tag.bcp47}

\show\mytest %expected de-1901, got de
\end{document}
jbezos commented 2 years ago

The fact german is not (present) German, while ngerman is seems to me an anomaly. I know there are ‘historical reasons’, but I wonder if preserving these conflicting names is the best option. IIR, German is the only case where the babel names overlap with those in the CLDR (I mean, the same name for different things), and I would like to respect the latter, which are kind of standard. Besides german, there is swissgerman, which is de-CH in babel, but gsw in the CLDR, because it’s a different language/dialect. Maybe a switch for these cases will do the job (something like names=cldr), but this wouldn’t be very elegant and can bring even more confusion.

jbezos commented 2 years ago

@jspitz Any thoughts?

u-fischer commented 2 years ago

personally I would have no problem if german/ngerman both would refer to present German, and the older is referred to say german-1901.

But the behaviour of ldf and ini should be consistent. It is not good if the german.ldf loads hyphenation patterns for the old spelling while the ini claims it is new.

jspitz commented 2 years ago

I think for backwards compatibility reasons, german should link to de_DE-1901, and ngerman to de_DE.

jbezos commented 2 years ago

Thinking aloud — an alternate set of ini files (let’s call them, say, de-x-babel) which takes precedence if the corresponding ldf files have been already loaded.

u-fischer commented 2 years ago

Thinking aloud — an alternate set of ini files (let’s call them, say, de-x-babel) which takes precedence if the corresponding ldf files have been already loaded.

I don't think that it would be a good idea if german would sometimes mean old and sometimes new spellings and to make the situation even more complex.

@jspitz

I think for backwards compatibility reasons, german should link to de_DE-1901, and ngerman to de_DE.

Imho backward compatibility got lost when polyglossia decided to use german as main language name. Since then "german" can mean the one or the other spelling variant.

I also think that for many new users (which are too young to know about the spelling discussion) it is confusing to have to use ngerman as option for babel instead of the natural german. We quite often see examples in questions. I would suggest to really consider to break compatibility here and to clean up the situation.

jspitz commented 2 years ago

I don't think polyglossia breaks backwards compatibility. Polyglossia names are not babel names. A break of backwards compatibility is if old documents suddenly produce unexpected output. I suppose many thousands of documents out there who use babel's german would be affected (most of my own documents would in fact break, as I often use babel's german when quoting older German texts).

If you want to do a really sensible change, you should keep the old babel names as aliases in the background, and officially switch to the less ambiguous BCP-47 identifiers to select language varieties.

jspitz commented 2 years ago

(names such as german -- and even more so austrian -- are highly ambiguous anyway; the fact alone that german is linked to the German standard variety and not Swiss or Austrian is disputable)

jbezos commented 2 years ago

Many languages have evolved over time, even with significant changes, and the original name has been preserved: French, Spanish, and Russian are examples. For me, the most confusing point here is german isn’t actually the option to be used for German.

BTW, I’ve found another case of overlapping names: serbian in babel is sr-Latn instead of sr(-Cyrl). The ‘real’ Serbian is serbianc (I'll try to locate the author to see how this can be changed).

As to BCP47, they weren’t devised for user interfaces, but as unique identifiers at a lower level. And we are dealing with the name to be selected by the user. IMO, this isn’t the way to go (and in fact I think it doesn’t solve the problem at all, because after all the name in the IANA registry and in the CLDR for de is still German).

jspitz commented 2 years ago

As to BCP47, they weren’t devised for user interfaces, but as unique identifiers at a lower level. And we are dealing with the name to be selected by the user. IMO, this isn’t the way to go (and in fact I think it doesn’t solve the problem at all, because after all the name in the IANA registry and in the CLDR for de is still German).

Not quite the same. de == German means all varieties of German. This includes de-1901, de-AT, de-CH, de-Latf-1901 etc. Babel's ngerman is only a subset of de, namely de-DE-1996 (as babel's german is a subset, namely de-DE-1901).

The language name is ambiguous,

u-fischer commented 2 years ago

(names such as german -- and even more so austrian -- are highly ambiguous anyway; the fact alone that german is linked to the German standard variety and not Swiss or Austrian is disputable)

yes. But as LaTeX can't use all at the same time one has to make a choice which variant/spelling is meant with german on the user level and how to name or select the other ones.

Polyglossia choose as default for german variant=german and spelling=new so de-DE-1996, so did the babel ini-files, and the question is if one can unify that again with babel-german.

jspitz commented 2 years ago

I think one can't without breaking backwards compatibility. I think this outweights having identical language names (which is only identity on the surface anyway).

jspitz commented 2 years ago

I think it is fine that babel assumes de is de-DE-1996 (polyglossia does this as well). What is wrong is that de, then, does not set region.tag.bcp47 = DE and variant.tag.bcp47 = 1996.

In other words, if the language is set via tag, setting ngerman if "de" is input is OK (as Ulrike says, some variant has to be selected). BUT: this variant should then identify itself precisely if the BCP47 tag is queried. In that case, de is underspecified, the region and variant tags need to be reported as well.

My suggestion for babel would be to set up ini files for de-DE. Then you can direct babel-de.ini to that, and things become much more clear.

jbezos commented 2 years ago

This is the way the CLDR works. It’s a standard widely used and I see no real reason to break its rules. The locale de is strictly the same as de-DE, with an exception: the latter sets the region to DE, while the former doesn’t. The fact de is considered the equivalent, in principle, of de-Latn-DE is confirmed here:

https://unicode-org.github.io/cldr-staging/charts/38/supplemental/likely_subtags.html

This criterion is applied to all languages in the CLDR and the goal must be the removal of incompatibilities and inconsistencies, and not the addition of new ones (especially if the inconsistency only serves to “fix” another inconsistency).

Anyway, in babel the ‘likely’ tag is also available, too (in de is, of course, de-Latn-DE).

jbezos commented 2 years ago

Added a section on language naming in https://latex3.github.io/babel/news/whats-new-in-babel-3.75.html.

jbezos commented 2 years ago

babel-german.tex now points to babel-de-1901.ini if the hyphenation patterns for \l@german are de-1901 and the ldf file has been loaded. There is a similar trick for swissgerman. Although now there are some inconsistencies, I think this hack is the most transparent solution for users, requiring no actions from them: https://latex3.github.io/babel/news/whats-new-in-babel-3.77.html#german-and-ini-files