pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.5k stars 3.02k forks source link

PIP doesn't read setup.cfg in UTF-8, which causes UnicodeDecodeError #8931

Closed fireattack closed 3 years ago

fireattack commented 4 years ago

Environment

Description

When installing anything within a path that has a setup.cfg, pip will read it but fails at decoding if the file contains non-ASCII characters. The cfg file is proper UTF-8 but PIP doesn't open it as so. It tries to use "GBK" (my locale) and fails.

Related, but not the same issue: #8717 (this is caused by pyvenv.cfg)

Expected behavior

It should read setup.cfg in UTF-8.

How to Reproduce

  1. open a path.
  2. Create a setup.cfg file with
[metadata]
name = test
version = 0.0.1
packages =
    test
description = '計算ツール'
  1. Run pip install requests

Output

D:\test>pip install requests
ERROR: Exception:
Traceback (most recent call last):
  File "c:\program files\python3\lib\site-packages\pip\_internal\cli\base_command.py", line 228, in _main
    status = self.run(options, args)
  File "c:\program files\python3\lib\site-packages\pip\_internal\cli\req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 245, in run
    options.use_user_site = decide_user_install(
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 664, in decide_user_install
    if site_packages_writable(root=root_path, isolated=isolated_mode):
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 609, in site_packages_writab
le
    get_lib_location_guesses(root=root, isolated=isolated))
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 600, in get_lib_location_gue
sses
    scheme = distutils_scheme('', user=user, home=home, root=root,
  File "c:\program files\python3\lib\site-packages\pip\_internal\locations.py", line 109, in distutils_scheme
    d.parse_config_files()
  File "c:\program files\python3\lib\distutils\dist.py", line 406, in parse_config_files
    parser.read(filename)
  File "c:\program files\python3\lib\configparser.py", line 697, in read
    self._read(fp, filename)
  File "c:\program files\python3\lib\configparser.py", line 1017, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 88: illegal multibyte sequence
uranusjr commented 4 years ago

distutils is a built-in module, you’ll need to file the issue against CPython instead. https://bugs.python.org/

fireattack commented 4 years ago

Thanks for the quick reply.

Interestingly, setuptools itself doesn't fail with such cfg file. Are they doing something different? I was also wondering why we read setup.cfg to begin with for a global installation.

fireattack commented 4 years ago

OK, it looks to me setuptools work arounds this issue on their own (https://github.com/pypa/setuptools/pull/1180), is there any chance we do the same? I feel like distutils may have to be that way for backward compatibility reasons.

I knew this probably is too much to ask, so feel free to do what you think is the best.

uranusjr commented 4 years ago

Man, they basically re-implemented the whole thing. I’m not sure pip should be responsible to maintaining this implementation.

Maybe we should just catch the parsing error and carry on pretending the file does not exist instead. pip does not actually need any of the data you mentioned above; it just reads distutils configuration. The usage is honestly quite niche, and I doubt many would even notice the difference.

fireattack commented 4 years ago

Oh yeah, totally agreed with you. If we don't need anything ciritial from cfg file (or, at least, in case it isn't, like global installation shown here), it's reasonable to just ignore the error.

jaraco commented 4 years ago

Today this issue bit me too in jaraco/configparser#58.

uranusjr commented 4 years ago

Thinking about this more, this needs to be reimplemented eventually when pip migrates off distutils anyway, so we may as well do it right now. @jaraco Would it be a good idea if we extract setuptools’s config-parsing implementation into a standalone library that pip can vendor?

jaraco commented 4 years ago

In the case of distutils_scheme, there's currently an effort underway to deprecate distutils in which the distutils-schemes are being ported to the sysconfig module.

It's hard for me to say, but I'm a little reluctant to say that we should break out just the "parse config files" behavior into a separate library. On the other hand, that would eliminate one aspect of the dependency on distutils and limit the divergence from setuptools, so perhaps that makes sense.

pradyunsg commented 3 years ago

Is this still a thing in pip 20.3?

fireattack commented 3 years ago

Just tested, still broken here.

(v) D:\temp>python -m pip install pip -U
Collecting pip
  Using cached pip-20.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.1.1
    Uninstalling pip-20.1.1:
      Successfully uninstalled pip-20.1.1
Successfully installed pip-20.3

(v) D:\temp>pip install requests
Collecting requests
  Using cached requests-2.25.0-py2.py3-none-any.whl (61 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2020.11.8-py2.py3-none-any.whl (155 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.2-py2.py3-none-any.whl (136 kB)
Installing collected packages: urllib3, idna, chardet, certifi, requests
ERROR: Exception:
Traceback (most recent call last):
  File "d:\temp\v\lib\site-packages\pip\_internal\cli\base_command.py", line 210, in _main
    status = self.run(options, args)
  File "d:\temp\v\lib\site-packages\pip\_internal\cli\req_command.py", line 180, in wrapper
    return func(self, options, args)
  File "d:\temp\v\lib\site-packages\pip\_internal\commands\install.py", line 392, in run
    installed = install_given_reqs(
  File "d:\temp\v\lib\site-packages\pip\_internal\req\__init__.py", line 82, in install_given_reqs
    requirement.install(
  File "d:\temp\v\lib\site-packages\pip\_internal\req\req_install.py", line 781, in install
    scheme = get_scheme(
  File "d:\temp\v\lib\site-packages\pip\_internal\locations.py", line 184, in get_scheme
    scheme = distutils_scheme(
  File "d:\temp\v\lib\site-packages\pip\_internal\locations.py", line 108, in distutils_scheme
    d.parse_config_files()
  File "C:\Program Files\Python3\lib\distutils\dist.py", line 406, in parse_config_files
    parser.read(filename)
  File "C:\Program Files\Python3\lib\configparser.py", line 697, in read
    self._read(fp, filename)
  File "C:\Program Files\Python3\lib\configparser.py", line 1017, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 93: illegal multibyte sequence

(v) D:\temp>

(Having the same setup.cfg above in cwd)

henryiii commented 3 years ago

Since distutils is deprecated and slated for removal in 3.12, maybe this is a good idea now?

uranusjr commented 3 years ago

Yeah. The code path will eventually be removed altogether and replaced by an equivalent based on sysconfig. The migration has started in #9626, but the road ahead is still long.

uranusjr commented 3 years ago

Also the new implementation does not currently contain the cfg parsing logic at all, and if no-one complains maybe we can just kill that feature entirely.

henryiii commented 3 years ago

What was it used for?

uranusjr commented 3 years ago

It can be used to control certain distutils (and by extension setuptools) behaviours, like what directory to store intermediate objects during build, compiler options, etc. Very low-level, implementation-dependant stuffs, and some even in direct conflict with other pip options, so hopefully nobody is using them, but I’ve seen pip users use worse “features”.

jaraco commented 3 years ago

Why not a quick and dirty fix to suppress the error when it occurs and while at it, deprecate the code path, so if it is parsed and it contains settings that affect pip, raise warnings that it's going away, thereby detecting users who might be relying on it?

jaraco commented 3 years ago

I'm also okay with just removing the behavior and seeing who cries as long as there's a rapid response to bring the behavior back or an escape hatch to re-enable the behavior.

uranusjr commented 3 years ago

The logic is buried deep in distutils and triggered when pip calls distutils to know where to install stuff into, distutils does not offer a way to disable the behaviour. So pip can’t just suppress the error since it needs those values to function, and the only way out of this (from what I can tell) is to remove reliance on those distutils values, which is what #9626 is going for.

layday commented 3 years ago

I was under the impression that it was this line that was raising:

https://github.com/pypa/pip/blob/e7d6e54dff0ef93197624c51cbde264dc390c511/src/pip/_internal/locations/_distutils.py#L35

Would it not be possible to wrap it in a try-except?

uranusjr commented 3 years ago

We can, but the location behaviour would change in subtle ways when that fails 😟

But now that I think of it, maybe that’d still be a good enough stopgap before we purge distutils? Nothing would change if the parsing succeeds (which is most of the time), and the behaviour will “mostly work” when it fails, which might be good enough.

layday commented 3 years ago

What does "mostly work" mean? If I have, say:

[install]
install_scripts = foo

in my setup.cfg and the parsing fails then the scripts would not be where I expect. (Although this is already rather fragile because pip does not respect distutils configuration when uninstalling packages.)

uranusjr commented 3 years ago

That would not fail though. The parsing would only fail if the file contains content not decodable with the platform encoding.

distutils also reads multiple config locations, including system-wide distutils.cfg, and for values like home, prefix, etc. which are honoured on uninstallation IIRC. Those are likely the more “popular” existing usages (although likely still not very wide-spread at this point, at least I hope not).

layday commented 3 years ago

That would not fail though. The parsing would only fail if the file contains content not decodable with the platform encoding.

Yeah, I understand that.