python / cpython

The Python programming language
https://www.python.org
Other
62.72k stars 30.07k forks source link

codecs.open doesn't support encoding='locale' in Python3.11 #120406

Open Amethiel opened 3 months ago

Amethiel commented 3 months ago

Bug report

Bug description:

from 3.10, io.text_encoding may return 'locale' if encoding is None, and the function open can support it well, like:

>>> open('/dev/null',encoding=io.text_encoding(None))
<_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>

but codes.open raised an error when i was doing this:

>>> codecs.open('/dev/null',encoding=io.text_encoding(None))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen codecs>", line 910, in open
LookupError: unknown encoding: locale

>>> codecs.open('/dev/null',encoding='locale')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen codecs>", line 910, in open
LookupError: unknown encoding: locale

and codes.open can deal None correctly:

>>> codecs.open('/dev/null',encoding=None)
<_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>

My Python version is: Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32

CPython versions tested on:

3.11

Operating systems tested on:

Linux, Windows

Zheaoli commented 3 months ago

Confirmed in main branch. I'm not sure we need to keep the same behavior for the encoding field in open and codecs.open API. CC @Eclips4

serhiy-storchaka commented 3 months ago

3.11 is already in security-only mode. Bug fixes can only be applied for 3.12+.

serhiy-storchaka commented 3 months ago

io.text_encoding() was added in 3.10, and the behavior was the same as in 3.11.

So this is rather a new feature request.

Zheaoli commented 3 months ago

@serhiy-storchaka May I take handle of this feature request if we confirm this is necessary?

serhiy-storchaka commented 3 months ago

@malemburg, what do you think about support of encoding='locale' in codecs streams?

malemburg commented 3 months ago

I don't think such meta codecs are a good idea, since their meaning depends on the local configuration of the app running the code and can thus behave in undefined ways. The "mbcs" codec we have is similar and has issues, since you don't know what the actual encoding will be. The codec is needed because the Windows C API uses this meta encoding, but it's not ideal.

In the above use case, it's better to use a fixed default encoding such as "utf-8".

You can see that the io sub-system also doesn't use "locale" as encoding, but instead seems to go out to the os.environ to figure out what the locale settings are and then uses whatever this returns for the actual encoding ("utf-8" in case of the example).

PS: In the early days of Python 2 we experimented with using a locale based default encoding and quickly dropped that idea. Explicit is better than implicit...