python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.33k forks source link

`locale.windows_locale`: Incorrect Windows locale for Cambodian #123853

Open seanbudd opened 1 month ago

seanbudd commented 1 month ago

Bug report

Bug description:

According to the Windows spec, the locale identifier for Cambodian (0x0453/1107) should be "km-KH"

Sources:

Currently locale.windows_locale[1107] == "kh_KH" incorrectly. https://github.com/python/cpython/blob/3.12/Lib/locale.py#L1596

It is possible this mistake is from an older version of the protocol, but using the MS reference, the current mapping in Windows has been the case since the earliest recorded spec from 8/8/2013.

If this issue is accepted I am happy to make a small PR to adjust this value.

CPython versions tested on:

3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Windows

Linked PRs

rruuaanng commented 1 month ago

I think it's ok, and I have modified it

seanbudd commented 1 month ago

@rruuaanng - could you please elaborate further. What is ok? What have you modified?

rruuaanng commented 1 month ago

And you should bring the OS-windows label

seanbudd commented 1 month ago

@rruuaanng - I can't label issues, that's only for people with triage/write permissions

rruuaanng commented 1 month ago

@rruuaanng - could you please elaborate further. What is ok? What have you modified?

I changed kh_KH to km_KH

seanbudd commented 1 month ago

@rruuaanng - you manually changed it locally by patching it? That's not a solution for cpython. We should adhere to the standards set by Microsoft. this is an issue to fix it with cpython, not in your own project

rruuaanng commented 1 month ago

@rruuaanng - you manually changed it locally by patching it? That's not a solution for cpython. We should adhere to the standards set by Microsoft. this is an issue to fix it with cpython, not in your own project

I checked out the link you gave me at https://ss64.com/locale.html and fixed the issues you mentioned. Everything should be good to go now

seanbudd commented 1 month ago

@rruuaanng - how did you fix it? how did you test it?

Eclips4 commented 1 month ago

@rruuaanng If this issue is valid (should be confirmed from locale experts, cc @malemburg), we need to send a PR with a fix, applying changes locally affects only your local git repository.

rruuaanng commented 1 month ago

@rruuaanng - how did you fix it? how did you test it?

I checked the results of the windows_locale.get(int(code, 0)) statement to see if it gets the corrected value now.

windows_locale dictionary isn't used anywhere else, so I think fixing the source of the error should be enough.

rruuaanng commented 1 month ago

@rruuaanng - could you please elaborate further. What is ok? What have you modified?

I've submitted my PR, Hoping you can point out any mistakes I made

hugovk commented 1 month ago

@rruuaanng Please note the issue author offered to submit a PR if this is accepted. It's good manners to let them have a first go.

rruuaanng commented 1 month ago

@rruuaanng Please note the issue author offered to submit a PR if this is accepted. It's good manners to let them have a first go.

Okay. I’ve canceled the PR. Thanks for the heads-up

vstinner commented 1 month ago

I changed my locale to Khmer. Python gives me:

>>> locale.getdefaultlocale()
('km_KH', 'cp1252')

The windows_locale dictionary is not used (so changing it would have no effect), since the underlying _locale function returns a string and not a code starting with 0x:

>>> import _locale; _locale._getdefaultlocale()
('km_KH', 'cp1252')

So the string 'km_KH' comes directly from Windows GetLocaleInfoA() function.


The windows_locale dictionary is only used by the locale.getdefaultlocale() function and this function is deprecated: it will be removed in Python 3.15. Why do you use locale.getdefaultlocale() instead of locale.setlocale(locale.LC_CTYPE, '') or locale.getlocale()? You can also use locale.getencoding() to get the locale encoding.

>>> locale.setlocale(locale.LC_CTYPE, '')
'Khmer_Cambodia.1252'
seanbudd commented 1 month ago

@vstinner - I'm not sure I understand the point you are trying to make in regards to deprecation. We use locale.windows_locale directly, is that deprecated too (or never officially supported)?

vstinner commented 1 month ago

We use locale.windows_locale directly

Ah. This dictionary is not documented. How do you use it? Do you have an example?

serhiy-storchaka commented 1 month ago

I spent half a day today updating windows_locale to the latest official data, only to find that it is not being used. :frowning: :man_facepalming:

rruuaanng commented 1 month ago

We use locale.windows_locale directly

Ah. This dictionary is not documented. How do you use it? Do you have an example?

This means that no changes to the code are required, right?

rruuaanng commented 1 month ago

In getdefaultlocale func, Yes! I found this:

    import warnings
    warnings._deprecated(
        "locale.getdefaultlocale",
        "{name!r} is deprecated and slated for removal in Python {remove}. "
        "Use setlocale(), getencoding() and getlocale() instead.",
        remove=(3, 15))
seanbudd commented 1 month ago

@vstinner

Ah. This dictionary is not documented. How do you use it? Do you have an example?

Does this mean it's not part of the supported API? Will it be removed when getdefaultlocale is removed? We use the dictionary for converting windows LCIDs to language strings. For example - we need to determine the language code for SAPI5 synthesizers using the language attribute. We also use it when getting language information using the UIA accessibility API for text UIA_CultureAttributeId. There's several other similar cases like this when using the Windows API where we need to convert LCIDs to language codes. We could always create and maintain this dictionary ourselves but its not ideal. Alternatively there is also LCIDToLocaleName, so dropping this dictionary is not a show stopper.

vstinner commented 1 month ago

Does this mean it's not part of the supported API?

It means that you're in the gray area, maybe it's supposed, maybe not :-)

Will it be removed when getdefaultlocale is removed?

Good question. I didn't know that windows_locale was used directly. Maybe it should go through a regular PEP 387 deprecation first if we want to remove it.

@serhiy-storchaka: Maybe it's worth it to update windows_locale, since apparently, it's being used.

serhiy-storchaka commented 1 month ago

There is a problem -- name of some Windows locales is incompatible with gettext format. For example, "sr-Latn-RS" on Windows and "sr_RS@latin" on Linux. locale.setlocale() raises an error for "sr_RS@latin" on Windows, but if you set the "sr-Latn-RS" locale, locale.getlocale() will raises an error as it unable to parse it. So what should we use? Other example -- "ca-ES-valencia" on Windows and "ca_ES@valencia" on Linux.

The current table ignores modifiers, and _locale._getdefaultlocale() ignores them too, but this is wrong.