silnrsi / langtags

Manage a set of language tag equivalence sets
13 stars 4 forks source link

az-Latn-AZ incorrect Autonym #4

Closed jasonleenaylor closed 4 months ago

jasonleenaylor commented 4 years ago

The ethnologue lists 'Azərbaycan dili, Azərbaycanca' as the correct autonyms and localname for this record lists Azərbaycan

mhosken commented 4 years ago

The code for Azerbaijani is 'az'. The corresponding ISO639-3 for 'az' is 'aze'. The ethnologue has no data in its autonymns for aze (or language names come to that), since it treats aze as a macro language. This is also reflected in the use of azj (Northern Azerbaijani) as being equivalent. I am wary of taking the autonymn of a sublanguage as the autonym for the macro language. In this case there is an autonymn available from the CLDR, which is what is taken.

The waters are further muddied by both Southern Azerbaijani and Northern Azerbaijani having the same autonymns, although the Southern is in Arabic script. So it may be safe to take it.

I think my difficulty is that currently there is only one autonymn allowed in langtags.json. But I have multiple sources. Even autonymns.csv has more than one for both azj and azb. What should I do with them? Do I need to change the autonymn field in langtags.json into a list? I realise that's not an option, but do we want another field that lists the extra autonymns? Would that help?

jasonleenaylor commented 4 years ago

Someday we'll find a simple problem to fix I'm sure.

So before I get into the other parts of the discussion where in the world is 'Azərbaycan' coming from? It isn't a valid autonymn for az-AZ in any data that I can see.

It would be good for langtags.json to have all the available autonymns but it isn't critical. It should be fine to have that in a separate field. Then the matter of picking which one goes in 'localname' becomes relevant.

jasonleenaylor commented 4 years ago

Interesting, older versions of the CLDR data had Azərbaycanca, where the latest version has Azərbaycan. Ken Keyes, who works there said that 'Azərbaycanca' or 'Azərbaycan dili' are both acceptable.

mhosken commented 4 years ago

The core bug here is that the ethnologue names were not getting into the langtags.json. This has been fixed. Note that it is possible and common for a tagset to not have a localname but to also have a localnames.