pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.55k stars 3.04k forks source link

Wrong version detection with setuptools_scm, `pip install .` and hg-git #10635

Open paugier opened 3 years ago

paugier commented 3 years ago

Description

I try to install a package (https://github.com/exabl/snek5000) with pip install -e . cloned with Mercurial and hg-git.

This package uses setuptools_scm to detect its version. setuptools_scm now supports using Mercurial as a Git client and provides the right version when using the commands python setup.py develop and pip install -e . --no-build-isolation.

However, if I just use pip install -e ., the package is correctly installed but the detected version is completely wrong.

Expected behavior

No response

pip version

pip 21.3.1

Python version

CPython 3.9

OS

Linux

How to Reproduce

With Mercurial setup to work with hg-git

hg clone git@github.com:exabl/snek5000.git
cd snek5000
pip install -e .

Output

No response

Code of Conduct

paugier commented 3 years ago

I forgot to ask explicitly: I'd like to investigate what happens, but I don't know how to do it since it works fine with commands for which I have a bit of control (python setup.py develop and pip install -e . --no-build-isolation).

How can I try to understand/fix this issue?

paugier commented 3 years ago

I used

[build-system]
requires = ["setuptools>=49.5.0", "wheel",
"setuptools_scm[toml] @ file://localhost//home/pierre/Dev/setuptools_scm#egg=setuptools_scm",
"setuptools_scm_git_archive"]

to try to understand what happens during the isolated build in setuptools_scm.

setuptools_scm calls Mercurial during the build. Strangely, from the isolated build, mercurial can't see its extension hg-git. Mercurial is installed in its own (conda) environment and should not be influenced by the isolated build.

It seems to be related to the fact that hg-git was installed in the Mercurial environment with pip install -e .. If I reinstall hg-git with pip install hg-git -U (of course, in the Mercurial environment), Mercurial can correctly see hg-git from the isolated build.

Changing the behavior of an independent application is indeed a very strange behavior of an isolated build.

uranusjr commented 3 years ago

This sounds like an issue with hg-git or setuptools. The main difference between installing things with and without -e is the excutable generated by latter is (currently) done by setuptools, not pip.

paugier commented 3 years ago

Actually, there is no need to add the -e to get the problem: even when I install snek5000 with pip install ., I get the bug (i.e., during the isolated build, Mercurial cannot import an extension installed in its environment with pip install -e .).

I'm going to try to provide a cleaner way to reproduce.

I don't think the problem can be due to anything in hg-git. hg-git is just a simple python package and Mercurial just detects hg-git by trying to import it. It seems that Mercurial gets an ImportError when the command hg is run from an isolated build.

RonnyPfannschmidt commented 3 years ago

The system mercurial might need isolation from the virtualenv used by the build isolation, This may require additional tooling in setuptools_scm

paugier commented 3 years ago

Here is a simple way to reproduce something similar with Ubuntu 20.04: https://github.com/paugier/reproducer-bug-isolated-build-hg/blob/main/.github/workflows/ci.yml

I used the system Mercurial /usr/bin/hg and install hg-git with pip2 (since /usr/bin/hg still uses Python 2.7 in Ubuntu 20.04). pip2 automatically installs hg-git in --user mode. It is a very standard way to install Mercurial extensions (see https://foss.heptapod.net/mercurial/hg-git/-/tree/branch/0.10.x and https://foss.heptapod.net/mercurial/evolve).

We see here that the isolated build breaks the application Mercurial, which cannot import its extensions.

pfmoore commented 3 years ago

That's because (see here) pip runs the build in an environment where $PYTHONNOUSERSITE is set to 1, to protect the build backend environment from being "polluted" by packages installed outside of the build environment.

I'm not sure what the correct solution is here. IMO, a system tool like /usr/bin/hg should not be affected by the settings of Python environment variables like PYTHONNOUSERSITE, as those are intended to allow the user to control the behaviour of the Python interpreter (which is how pip is using them). I'd argue that the executable wrapper for Mercurial should set the Python environment variables to a "known state", rather than letting the parent process' values leak through. But that's easy for me to say, knowing what pip is trying to do, and there are almost certainly other considerations on the Mercurial side of things.

I don't think this is a pip issue as such (you can get the same problem just by manually setting PYTHONNOUSERSITE in your environment) but maybe we need a standard (essentially an add-on to PEP 517) clarifying a bit further what the environment in which a PEP 517 build backend should be run must look like. Agreeing such a standard would thrash out details like this in a way that all tools can rely on, rather than having it be a pip-specific implementation detail.

RonnyPfannschmidt commented 3 years ago

mercurial uses a custom script for its binry that is all python

my basic impression is that any progeam tats not explicitly isolating its executable from surrounding context is affected

i wonder if its a reasonable workaround to hide the NOUSERSITE env var in setuptoos_scm either via opt-in or via opt-out

paugier commented 3 years ago

Such workaround in setuptools_scm would quickly fix the issue for users of hg-git. If we wait for a fix in Mercurial, this bug will be there for years even for not so old distributions.

Note that we would also need for hg calls to remove from PYTHONPATH the path looking like /tmp/pip-build-env-veix2dp_/site. It contains a file sitecustomize.py which, IMHO, does not make sense for Mercurial.

In the principle, it's a bit strange to also isolate at the level of applications called during the build. For example, nothing is done to really isolate Git or compilers. If the Python API of Mercurial was used during the build, isolation would clearly be good, but here, it's used as an application. Passing to hg the environment variables used to isolate the build environment is actually weird.

pfmoore commented 3 years ago

IMO, if an application is delivered as a standalone utility, you shouldn't be able to tell what language it's written in. So ignoring general Python environment variables should be the norm. And certainly, the application shouldn't stop working just because the user sets language-specific variables. Having said that, I accept this is not general practice. And I understand that the practicalities mean that it may be necessary to work around things at a higher level.

But if we want to make this robust (by which I mean, something that works in all tools - build as well as pip, for example) we need to agree on a set of expectations for how tools set up the build environment, and that needs standardising.

RonnyPfannschmidt commented 3 years ago

It's a common problem that python applications typically do not protect against the environment

The only distro that I'm aware of that manages it is nixos, and that one doesn't share build isolation behaviour with any of the other well known systems

Build tools will have to be unnecessary smart about this

pfmoore commented 3 years ago

[Offtopic] I wonder whether pip's entry point wrappers should explicitly unset all Python-specific environment variables before running the Python interpreter? It would probably break too many applications...

RonnyPfannschmidt commented 3 years ago

Would need a new entrypoint and a pep

uranusjr commented 3 years ago

Would switching to use venv (which does not require us setting PYTHONNOUSERBASE) fix this? See discussion in #6264 as well

paugier commented 3 years ago

Would switching to use venv (which does not require us setting PYTHONNOUSERBASE) fix this? See discussion in #6264 as well

Yes, it seems that using venv would fix this issue. venv does not set PYTHONPATH nor PYTHONNOUSERBASE.

$ time python -m venv tmp_venv_call_hg --without-pip 
real    0m0.330s
user    0m0.260s
sys 0m0.078s
$ . tmp_venv_call_hg/bin/activate
$ python -c "from subprocess import run; run('hg version -v --config extensions.hggit='.split())"
Mercurial Distributed SCM (version 5.6.1)
(see https://mercurial-scm.org for more information)

[...]

Enabled extensions:

  hggit       external  0.10.2 (dulwich 0.20.25)
  [...]
paugier commented 3 years ago

I try to summarize. I think this issue is now quite well understood.

It is related to the custom isolation used by pip and to the sensibility of (at least some) Mercurial installations to environment variables like PYTHONPATH and PYTHONNOUSERBASE.

The solutions could be:

  1. Improve Mercurial in terms of ignoring Python specific environment variables. I don't know at which level it should be done. I guess it depends somehow on the installation method. Even if the next version of Mercurial is improved, the issue will continue to be there for most users since people tend to use quite old versions of hg.

  2. Implement the Mercurial isolation at the setuptools_scm level, i.e. remove PYTHONNOUSERBASE and clean PYTHONPATH for the environment used to call hg. It's technically very simple (I can even submit a PR) and could also work with other installation tools for installation using setuptools_scm.

  3. Use internally in pip a proper virtual environment created with venv --without-pip (no need to use/change PYTHONNOUSERBASE and PYTHONPATH). It would fix other similar problems, in particular for other Python applications used during build. It would not fix the problem for other install tools, except if there is also a pep on "expectations for how tools set up the build environment".

uranusjr commented 3 years ago

One thing I want to be sure is whether a proper virtual environment indeed correctly ignore user-site packages without PYTHONNOUSERBASE (this is the reason why we need to set that flag right now). I’m about 99.9% certain it does, but someone should make sure.

After that, we can wait on pypa/build#361 and transplant that to pip to make everything work properly.

pfmoore commented 3 years ago
❯ py -m pip list -v
Package    Version      Location                                                                  Installer
---------- ------------ ------------------------------------------------------------------------- ---------
pip        21.3.1       c:\users\pfm\appdata\local\programs\python\python39\lib\site-packages     pip
setuptools 58.5.3       c:\users\pfm\appdata\local\programs\python\python39\lib\site-packages     pip
Spans      1.1.1        c:\users\pfm\appdata\roaming\python\python39\site-packages                pip
tzdata     2021.2.post0 c:\users\pfm\appdata\local\programs\python\python39\lib\site-packages     pip
wheel      0.37.0       c:\users\pfm\appdata\local\programs\python\python39\lib\site-packages     pip
PS 11:57 00:01.480 C:\Work\Support
❯ py -m venv xx
PS 11:58 00:03.809 C:\Work\Support
❯ .\xx\Scripts\pip.exe list -v
Package    Version Location                             Installer
---------- ------- ------------------------------------ ---------
pip        20.2.3  c:\work\support\xx\lib\site-packages pip
setuptools 49.2.1  c:\work\support\xx\lib\site-packages pip
WARNING: You are using pip version 20.2.3; however, version 21.3.1 is available.
You should consider upgrading via the 'c:\work\support\xx\scripts\python.exe -m pip install --upgrade pip' command.
PS 12:02 00:01.631 C:\Work\Support
❯ dir env:PYTHON*
PS 12:02 00:00.005 C:\Work\Support
❯

Is that a sufficient check? Note that spans is in user site-packages in the system environment.

uranusjr commented 3 years ago

Yeah looks right to me!