python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.31k stars 438 forks source link

Support for Chinese region subtags #543

Open jenstroeger opened 6 years ago

jenstroeger commented 6 years ago

I’m using Babel 2.5.0.

I noticed that the support for Chinese language subtags is incomplete:

>>> import babel
>>> babel.__version__
'2.5.0'
>>> babel.Locale('zh')
Locale('zh')
>>> babel.Locale('zh', 'CN')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/V…/lib/python3.5/site-packages/babel/core.py", line 168, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'zh_CN'

The exception is being raised for the 'CN' (mainland China) or 'TW' (Taiwanese) regions, as well as for some of the 28 IANA subtags like 'cdo', 'cjy', etc.

However, Traditional and Simplified Chinese are supported:

>>> babel.Locale('zh', 'Hans')
Locale('zh', territory='Hans')
>>> babel.Locale('zh', 'Hant')
Locale('zh', territory='Hant')

Will such support be added anytime soon?

akx commented 6 years ago

Hi!

This behavior is more or less expected, though admittedly poorly documented. You should use the babel.Locale.parse() function to acquire zh_CN -- you'll actually get zh_Hans_CN. Likewise, parseing zh_TW works and yields zh_Hant_TW:

>>> import babel
>>> babel.Locale.parse('zh_CN')
Locale('zh', territory='CN', script='Hans')
>>> babel.Locale.parse('zh_TW')
Locale('zh', territory='TW', script='Hant')

The second example you have there is a little anomalous – it definitely should be Locale('zh', script='Hant') (or Hans), but I suspect this is due to the way the locale data files are named. Like before, though, Locale.parse() to the rescue:

>>> babel.Locale.parse('zh_Hant')
Locale('zh', script='Hant')
jenstroeger commented 6 years ago

I came across another issue: the ISO 639-1 code for Norwegian is no, but that’s not supported whereas nn (Norwegian Nynorsk) and nb (Norwegian Bokmål) are:

>>> babel.Locale('no')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/Develp/talaera/server/lib/python3.5/site-packages/babel/core.py", line 168, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'no'
>>> babel.Locale('nn')
Locale('nn')
>>> babel.Locale('nb')
Locale('nb')

None of the ISO 639-2 codes (nor, nno, nob) work, but fil (Filipino) does but doesn't have a ISO 639-1 code.

akx commented 6 years ago

This, I feel, is also as expected. Babel is concerned about written languages; no is a macrolanguage encompassing the Nynorsk and Bokmål written forms. It's an app-political question whether no would or should map to nn or nb. Statistically speaking, according to Wikipedia, the scales tip in favor of Bokmål.

This mapping is something that could be automagically determined by the planned locale loader system for Babel 3.0, whenever the development for that starts.

nim-odoo commented 6 years ago

@akx Hello,

Is there a reason why nb is not in LOCALE_ALIASES?

>>> babel.Locale('nb')
Locale('nb')
>>> babel.core.LOCALE_ALIASES['nb']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nb'

>>> babel.Locale('no')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/babel/core.py", line 170, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'no'
>>> babel.core.LOCALE_ALIASES['no']
'nb_NO'

That seems to be inconsistent, and I would have expected nb to be in LOCALE_ALIASES.

Thanks

cc @mart-e