pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

Providing pip configuration in `sys.base_prefix` #9752

Open pelson opened 3 years ago

pelson commented 3 years ago

pip version

main

Python version

all

OS

all

Additional information

When a pip config is installed in an installation prefix $PREFIX/pip.conf, $PREFIX/bin/python -m pip correctly picks up the config. When one makes a virtual environment with $PREFIX/bin/python -m venv ./my-venv then ./my-venv/bin/pip does not pick up the config.

It is questionable if this is a bug or a feature request, but essentially, I believe that pip should be looking in sys.base_prefix as well as sys.prefix for a config file.

Description

No response

Expected behavior

No response

How to Reproduce

Starting with a non-virtual environment (e.g. a conda environment):

$ touch ./env/pip.conf
$ pip config debug
env_var:
env:
global:
  /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False
  /etc/xdg/pip/pip.conf, exists: False
  /etc/pip.conf, exists: False
site:
  /media/important/github/pypa/pip/env/pip.conf, exists: True
user:
  /home/pelson/.pip/pip.conf, exists: False
  /home/pelson/.config/pip/pip.conf, exists: False

$ python -m venv ./venv

## BEWARE THAT YOUR VENV HAS THE BUNDLED PIP, SO INSTALL A NEWER PIP FOR DEBUGGING
$ ./venv/bin/pip install -e /path/to/pip/repo

$ ./venv/bin/pip config debug
env_var:
env:
global:
  /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False
  /etc/xdg/pip/pip.conf, exists: False
  /etc/pip.conf, exists: False
site:
  /media/important/github/pypa/pip/venv/pip.conf, exists: False
user:
  /home/pelson/.pip/pip.conf, exists: False
  /home/pelson/.config/pip/pip.conf, exists: False

What I want to see for a venv is that the base_prefix is searched as well as the prefix.

Output

No response

Code of Conduct

pradyunsg commented 3 years ago

Hi! Thanks for filing this issue and PR!

Could you please elaborate on why you want this? What use case does this help with?

uranusjr commented 3 years ago

I think this is by design? A virtual environment is supposed to isolated things from the global environment, so configuration from sys.base_prefix should not be picked up.

pelson commented 3 years ago

Could you please elaborate on why you want this? What use case does this help with?

Sure, I'll try with my specific case, and then hopefully we can draw parallels elsewhere.

I an the responsible for a Python distribution to a controlled network. There are other Python distributions on the controlled network (both accessible from the same machine) for which I am not responsible, and which need specific pip configurations. I too have a specific pip configuration (e.g. index-url), I therefore configure them in the pip.conf of the distribution. It is designed that users (developers) take virtual environments from this distribution in order to extend it as needed - pip is a fundamental part of that work flow to allow them to extend as required. I therefore want those users to have the appropriate pip configuration within their virtual environments (out of the box).

To re-iterate, it is not acceptable to have the config be global (e.g. in /etc/pip.conf) nor user enabled (e.g. in ~/pip.conf) as they need to also be able to use another Python distribution with different pip configurations (i.e. the pip config needs to be isolated to the environment).

Environment variables defining the config aren't acceptable because the two Python distributions wouldn't be able to inter-operate (you can only have one PIP_INDEX_URL.

To me the idea that a virtual environment should inherit the pip configuration from the base environment is natural. After all, a virtual environment inherits the Python and its standard library from the base environment - if you change the config of how Python is built such config propagates/leaks to the virtual environment.

pfmoore commented 3 years ago

IMO, the only sort of environment that should inherit is a --use-system-site-packages one, as that's explicitly not isolated from the system environment. Certainly changing the default behaviour to inherit would be a significant backward incompatibility, and would likely cause problems for people who rely on global settings not leaking into a virtualenv (e.g., testing).

pelson commented 3 years ago

Thank you for your input so far.

would likely cause problems for people who rely on global settings not leaking into a virtualenv (e.g., testing)

We should try to be more precise with the terminology here. global settings do leak into a virtualenv:

$ cat /etc/pip.conf 
[global]
no-cache-dir = true

$ cat ./venv/pip.conf 
[global]
no-dependencies = yes

$ pip config list
global.no-cache-dir='true'
global.no-dependencies='yes'

What I'm suggesting is, just like Python config (e.g. compile flags) leaks into a virtual environment, so too should the pip config. i.e. site configuration should leak into a virtual env.

To turn this discussion around a little, perhaps I could ask you how you would control a virtual environment's pip configuration if global, user, and environment settings are not acceptable (because (a) a pip config doesn't apply to all Python installations on a machine, (b) it isn't a user configuration to define how a Python installation should interact with the distribution)?

I'm trying to not be hypothetical here, so my concrete scenario:

The problem occurs fairly quickly as users take a virtual environment from one of the distributions and they immediately have no pip configuration. Even worse, they try to use a PEP517 tool such as build to create a wheel python -m build ./ using the correctly (pip) configured Python distribution, but because the build isolation creates a new virtual environment the pip isn't configured and you get an error.

piotr-dobrogost commented 3 years ago

@pelson I have to admit the scenario you brought up is interesting. :)

To turn this discussion around a little, perhaps I could ask you how you would control a virtual environment's pip configuration if global, user, and environment settings are not acceptable (because (a) a pip config doesn't apply to all Python installations on a machine, (b) it isn't a user configuration to define how a Python installation should interact with the distribution)?

After taking a quick look at docs of venv it looks like you could extend EnvBuilder class and define post_setup() method.

From https://docs.python.org/3/library/venv.html#venv.EnvBuilder.post_setup

post_setup(context) A placeholder method which can be overridden in third party implementations to pre-install packages in the virtual environment or perform other post-creation steps. (emphasis mine)

The post-creation step in your case could be creation of appropriate pip configuration file in the newly created virtual environment.

gaborbernat commented 3 years ago

After taking a quick look at docs of venv it looks like you could extend EnvBuilder class and define post_setup() method.

That's my recommendation for using build too, to extend virtualenv via its plugins- https://github.com/pypa/build/issues/270#issuecomment-812003023.

pelson commented 3 years ago

Thanks for the suggestion of extending EnvBuilder. Indeed this does work to automatically configure a venv with pip.conf, and I've been running with such a tweak for quite some time. Unfortunately the python -m venv and python -m build invocations aren't something that you can override in this way (without monkeypatching during Python startup :speak_no_evil:).

Perhaps I could have been a bit clearer here: I'm not looking for a workaround per se (I have one, though I do appreciate ideas of other potential workaround solutions), I'm looking to address what I believe is a genuine shortfall (i.e. unexpected behaviour) in the way pip behaves with virtual environments.

That's my recommendation for using build too, to extend virtualenv via its plugins

I'm not using virtualenv, I'm using venv, the standard library way to make virtual environments. So too does build (https://github.com/pypa/build/blob/0.3.1/src/build/env.py#L188), no? There is no equivalent plugin system for venv.

the only sort of environment that should inherit is a --use-system-site-packages one

I disagree: I am not proposing that you inherit a single extra package with this change, merely that a locally (site) scoped pip.conf from a parent environment should be inherited in the child (virtual) one. I think considering configuration inheritance as the same as package inheritance (system site packages) would be a conflation of the --use-system-site-packages flag.

I'm trying to figure out real-world use cases where you wouldn't want to inherit the pip.conf from the parent environment. The one provided by @pfmoore "isolation testing" is a reasonable one, but I don't believe it should necessarily trump the one in which a user wants to create a virtual environment on a Python distribution which has been correctly pip configured and expect that their virtual env pip to "just work™".

In my PR (#9753) I implemented the easier approach of blending together the pip.conf in sys.base_prefix and sys.prefix (just like all of the other config search path items). It would totally solve my use case if, instead of doing a blend, it simply didn't look in sys.base_prefix if sys.prefix / pip.conf existed. This would solve the "isolated testing" usecase, as you would simply touch sys.prefix / pip.conf to avoid sys.base_prefix / pip.conf being considered. This may be best of both worlds - you can easily enough have isolated testing, but still get working behaviour out of the box.

uranusjr commented 3 years ago

build uses virtualenv if it’s available, and falls back to venv when it’s not.

pfmoore commented 3 years ago

"isolation testing" is a reasonable one, but I don't believe it should necessarily trump the one in which a user wants to create a virtual environment on a Python distribution which has been correctly pip configured and expect that their virtual env pip to "just work™".

Also, isolation is important from the point of view of bug triage - "can you reproduce your issue in an empyth virtualenv" is much easier than "can you reproduce your issue if you create a new virtualenv, hunt down and disable any config files you may have created in the past and forgotten about, ..."

But the real problem is that both are valid requirements. As a result, we're not discussing what the behaviour should be, but rather which behaviour should be default, and which should be opt-in.

Backward compatibility is a significant weight in favour of ignoring the site config by default.

IMO you've made some good arguments that your use case is worth considering, but your arguments are not strong enough to switch the default. If you want to continue arguing for your behaviour being the default, I suggest you focus on how to address the backward compatibility issue.

layday commented 3 years ago

I don't understand how #9753 is going to help with build inheriting a parent venv's configuration. When you create a venv from another venv, sys.base_prefix does not point to the parent venv's prefix; it always points to the installed Python's prefix. pip does not look inside sys.base_prefix I assume because (a) they would not want to encourage users to drop pip configuration directly in e.g. /usr and (b) because, in most cases, global configuration serves the same purpose.

pelson commented 3 years ago

Also, isolation is important from the point of view of bug triage - "can you reproduce your issue in an empyth virtualenv" is much easier than "can you reproduce your issue if you create a new virtualenv, hunt down and disable any config files you may have created in the past and forgotten about, ..."

I'm only semi-convinced about this one. You still have to deal with peoples environment variables, global config and user config. In reality you'd tell them to run pip config debug and, if it existed, perhaps tell users to PIP_DISABLE_CONFIG=1 for a truly clean config-free reproducer.

IMO you've made some good arguments that your use case is worth considering, but your arguments are not strong enough to switch the default.

Thank you for your insight and pointing out where I should focus the discussion.

But the real problem is that both are valid requirements. As a result, we're not discussing what the behaviour should be, but rather which behaviour should be default, and which should be opt-in.

If you want to continue arguing for your behaviour being the default, I suggest you focus on how to address the backward compatibility issue.

My proposal to read sys.base_prefix / pip.conf IFF sys.prefix doesn't exist is the easiest way to enable both behaviours conveniently, but it is indeed backwards incompatible.

The problem is that the two use cases are in conflict - so we need to be able to configure which behaviour we want. But in order to configure this using non-global config, we need to read the base_prefix's config...

Backward compatibility is a significant weight in favour of ignoring the site config by default.

The previous statement I made is the strongest I have in favour of changing the default. If we change the default, it is easy to then make an isolated virtual environment (touch sys.prefix / pip.conf) if you need it. The existing pip config debug is clear and continues to apply. If we don't change the default then it requires some more complexity in pip to decide if it should consider sys.base_prefix / pip.conf.

So let's take a look at what that might actually entail if we don't change the default:

We need some non-global, non-venv and non-envvar means to tell pip to blend sys.base_prefix / pip.conf with sys.prefix / pip.conf. So we could perhaps look at sys.base_prefix / pip.conf and look for a config item called, say use-base-pip-conf: true (default to false). If it is set, we include the rest of sys.base_prefix / pip.conf when building the pip configuration.

It isn't the nicest behaviour because we need to pre-read the config to figure if we want to read the rest of the config. pip config debug needs to become more nuanced ("the base config file exists, but we don't use it because it is not enabled"). This is the cost of keeping the "isolated by default" behaviour though. I can imagine this feature leading to a bit of confusion, if I'm honest.

Do you have different ideas about how we might be able to satisfy the two use cases without changing the default? Fundamentally I'm struggling because the "isolation" behaviour a subset of the "non-isolation" one (i.e. you can configure the "non-isolated by default" one to be isolated, but you can't configure the "isolated by default" one to be non-isolated).

layday commented 3 years ago

If we can step away from discussing the implementation for a minute, what's not been made clear through the course of this conversation is that this change would only be applicable to Conda environments, because Conda apparently mangles the value of sys.base_prefix of a Python installed in a Conda environment. This means that you can keep a development environment-local pip configuration in the "global" Python prefix, a peculiarity only found in Conda environments.

piotr-dobrogost commented 3 years ago

@layday I'm not familiar with Conda but the problem seems perfectly valid in the realm of the standard Python. If I understand correctly the original issue was "Isolated venv does not inherit PIP index " (https://github.com/pypa/build/issues/270). This has nothing to do with Conda. Actually I am surprised this issue has not come up way earlier. Just think of all these corporate environments when one is required to use specific pip index. It is natural to set this pip index as part of the base Python installation (often on a machine shared by many users) and hope it would propagate to virtual environments based on the base installation. I am in favour of reevaluating what the isolation of virtual environment is supposed to mean. I guess the original intend was to isolate virtual environment from packages installed in the base installation. We should consider if isolation from the base installation's configuration was kind of an accident making more harm than good.

layday commented 3 years ago

I'd question whether this is a valid configuration vector outside of a Conda environment. The intention with loading configuration from {sys.prefix}/pip.conf was to allow configuring pip in a virtual environment (and extended to support Conda environments in #6268), i.e. from a development-local config file. Corporate entities can configure pip globally instead of on a Python prefix basis - sys.base_prefix can be shared by but it can also vary (!) for multiple Python installations on the same system. What this means in practice is that you can ensure neither that a pip config file located in sys.base_prefix applies to only one specific Python installation nor that it is global.

pelson commented 3 years ago

what's not been made clear through the course of this conversation is that this change would only be applicable to Conda environments

I don't think this conversation has anything to do with conda. Here is a clean build of Python:

$ wget https://www.python.org/ftp/python/3.9.4/Python-3.9.4.tgz
$ tar xzf Python-3.9.4.tgz 
$ ./Python-3.9.4/configure --prefix $(pwd)/py39
$ make
$ make install

With this:

$ ./py39/bin/python3 -m venv ./my-venv

$ ./py39/bin/python3
>>> import sys
>>> sys.base_prefix
'/home/pelson/Downloads/cpython_clean/py39'
>>> sys.prefix
'/home/pelson/Downloads/cpython_clean/py39'

$ ./my-venv/bin/python
>>> import sys
>>> sys.base_prefix
'/home/pelson/Downloads/cpython_clean/py39'
>>> sys.prefix
'/home/pelson/Downloads/cpython_clean/my-venv'

And the pip config:

$ touch ./py39/pip.conf
$ ./py39/bin/pip3 config debug
env_var:
env:
global:
  /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False
  /etc/xdg/pip/pip.conf, exists: False
  /etc/pip.conf, exists: True
    global.no-cache-dir: true
site:
  /home/pelson/Downloads/cpython_clean/py39/pip.conf, exists: True
user:
  /home/pelson/.pip/pip.conf, exists: False
  /home/pelson/.config/pip/pip.conf, exists: False

$ ./my-venv/bin/pip config debug
env_var:
env:
global:
  /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False
  /etc/xdg/pip/pip.conf, exists: False
  /etc/pip.conf, exists: True
    global.no-cache-dir: true
site:
  /home/pelson/Downloads/cpython_clean/my-venv/pip.conf, exists: False
user:
  /home/pelson/.pip/pip.conf, exists: False
  /home/pelson/.config/pip/pip.conf, exists: False

To "step away from the implementation", as you say, I want the pip config for the base environment to apply to the virtual environment one. sys.base_prefix is the documented way to identify "base environment" (https://docs.python.org/3/library/sys.html#sys.base_prefix).

Perhaps I missed something in what you were saying though.


To keep the conversation on track, the 3 proposals that have so far been discussed:

  1. pip always looks at sys.base_prefix/pip.conf and blends it together with sys.prefix/pip.conf if they exist
  2. pip looks at sys.base_prefix if-and-only-if sys.prefix/pip.conf doesn't exist
  3. pip looks at sys.base_prefix/pip.conf to determine if it should consider the rest of sys.base_prefix/pip.conf to be blended with sys.prefix/pip.conf. By default this would be false so the current default behaviour would remain.

My preference is option 2 as it represents a far simpler implementation and is far less confusing to explain, and therefore less likely to generate support requests to pip. Unfortunately this is a breaking change as pointed out by @pfmoore. To get back to the old behaviour, one just needs to touch sys.prefix/pip.conf (or remove the sys.base_prefix/pip.conf!).

As far as I can see right now there is no other way for us to use non-global, non-user, non-environmental means to allow a venv to be configured out of the box - I have an open question to hear of other viable implementation options. To re-iterate, for the use case it is essential that a venv can be configured out of the box (e.g. isolated wheel building), it is completely acceptable that the base environment needs to be configured to do this (i.e. it isn't the default behaviour) but is not OK for it to be a post-venv creation step as this would preclude the use of venv using tools.

layday commented 3 years ago

It is relevant to Conda because only with Conda will you have a different base prefix for every development environment. Without this precondition there is no use case for base prefix dependent configuration as explained above.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, 13 April 2021 15:06, Phil Elson @.***> wrote:

what's not been made clear through the course of this conversation is that this change would only be applicable to Conda environments

I don't think this conversation has anything to do with conda. Here is a clean build of Python:

$ wget https://www.python.org/ftp/python/3.9.4/Python-3.9.4.tgz $ tar xzf Python-3.9.4.tgz $ ./Python-3.9.4/configure --prefix $(pwd)/py39 $ make $ make install

With this:

$ ./py39/bin/python3 -m venv ./my-venv

$ ./py39/bin/python3

import sys sys.base_prefix '/home/pelson/Downloads/cpython_clean/py39' sys.prefix '/home/pelson/Downloads/cpython_clean/py39'

$ ./my-venv/bin/python

import sys sys.base_prefix '/home/pelson/Downloads/cpython_clean/py39' sys.prefix '/home/pelson/Downloads/cpython_clean/my-venv'

And the pip config:

$ touch ./py39/pip.conf $ ./py39/bin/pip3 config debug env_var: env: global: /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False /etc/xdg/pip/pip.conf, exists: False /etc/pip.conf, exists: True global.no-cache-dir: true site: /home/pelson/Downloads/cpython_clean/py39/pip.conf, exists: True user: /home/pelson/.pip/pip.conf, exists: False /home/pelson/.config/pip/pip.conf, exists: False

$ ./my-venv/bin/pip config debug env_var: env: global: /etc/xdg/xdg-ubuntu/pip/pip.conf, exists: False /etc/xdg/pip/pip.conf, exists: False /etc/pip.conf, exists: True global.no-cache-dir: true site: /home/pelson/Downloads/cpython_clean/my-venv/pip.conf, exists: False user: /home/pelson/.pip/pip.conf, exists: False /home/pelson/.config/pip/pip.conf, exists: False

To "step away from the implementation", as you say, I want the pip config for the base environment to apply to the virtual environment one. sys.base_prefix is the documented way to identify "base environment" (https://docs.python.org/3/library/sys.html#sys.base_prefix).

Perhaps I missed something in what you were saying though.


To keep the conversation on track, the 3 proposals that have so far been discussed:

  • pip always looks at sys.base_prefix/pip.conf and blends it together with sys.prefix/pip.conf if they exist
  • pip looks at sys.base_prefix if-and-only-if sys.prefix/pip.conf doesn't exist
  • pip looks at sys.base_prefix/pip.conf to determine if it should consider the rest of sys.base_prefix/pip.conf to be blended with sys.prefix/pip.conf. By default this would be false so the current default behaviour would remain.

My preference is option 2 as it represents a far simpler implementation and is far less confusing to explain, and therefore less likely to generate support requests to pip. Unfortunately this is a breaking change as pointed out by @.***(https://github.com/pfmoore). To get back to the old behaviour, one just needs to touch sys.prefix/pip.conf (or remove the sys.base_prefix/pip.conf!).

As far as I can see right now there is no other way for us to use non-global, non-user, non-environmental means to allow a venv to be configured out of the box - I have an open question to hear of other viable implementation options. To re-iterate, for the use case it is essential that a venv can be configured out of the box (e.g. isolated wheel building), it is completely acceptable that the base environment needs to be configured to do this (i.e. it isn't the default behaviour) but is not OK for it to be a post-venv creation step as this would preclude the use of venv using tools.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

piotr-dobrogost commented 3 years ago

It is relevant to Conda because only with Conda will you have a different base prefix for every development environment.

No one is talking about different base prefix for every environment. The point is to have all virtual environments based on one, specific base Python installation to inherit pip configuration from this one, specific base Python installation. Also no one besides you mentioned Conda in any way.

Corporate entities can configure pip globally instead of on a Python prefix basis - sys.base_prefix can be shared by but it can also vary (!) for multiple Python installations on the same system.

That's exactly what OP stated:

There are other Python distributions on the controlled network (both accessible from the same machine) for which I am not responsible, and which need specific pip configurations.

As to:

What this means in practice is that you can ensure neither that a pip config file located in sys.base_prefix applies to only one specific Python installation nor that it is global.

Exactly the opposite is true. sys.base_prefix is unique per Python installation thus taking its pip's configuration into consideration in virtual environments allows all such environments to share common configuration tailored to the needs of this specific base Python installation.

pfmoore commented 3 years ago

To keep the conversation on track, the 3 proposals that have so far been discussed

... and to provide context, pip's current behaviour

This boils down to, pip looks at the currently active environment, the site configuration, and the user configuration. This is also simple to explain.

Do you have any examples of other software that looks up configuration from both sys.prefix and sys.base_prefix? I'm not aware of any.

🤷 I guess what it boils down to is that for me, the change is more difficult to describe than the current behaviour, it's not backward compatible, and it doesn't seem useful enough to justify the maintenance cost (without trying to be dismissive, your situation is clearly a fairly rare special case).

layday commented 3 years ago

Conda is mentioned in the bug report. I’ve explained why inheriting from a base prefix is problematic in other contexts.

I’m going to drop out of this conversation now since we’re going round in circles and people are starting to forget their manners.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, 13 April 2021 16:12, Piotr Dobrogost @.***> wrote:

It is relevant to Conda because only with Conda will you have a different base prefix for every development environment.

No one is talking about different base prefix for every environment. The point is to have all virtual environments based on one, specific base Python installation to inherit pip configuration from this one, specific base Python installation. Also no one besides you mentioned Conda in any way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

piotr-dobrogost commented 3 years ago

your situation is clearly a fairly rare special case

I think we might miss a chance to make important improvement for many use cases if we keep treating this as a rare special case. For one, there's nothing particularly special in having more than one base Python installation on one machine (especially in multi-user environment). The other thing is that I think we should see this issue as the one surfacing rather common problem of not having a way to transparently pre-configure virtual environments so that their users could easily obtain working environment without following additional, tedious configuration steps which otherwise could be automatised and hidden. I guess people got used to how things are in this regard and instead of looking to improve the overall situation like this issue tries are just creating and following instructions which are necessary to get newly created virtual environment to be actually useful for anything.

pfmoore commented 3 years ago

I think we might miss a chance to make important improvement for many use cases if we keep treating this as a rare special case.

Can you give examples of other use cases?

For one, there's nothing particularly special in having more than one base Python installation on one machine (especially in multi-user environment).

But the special case here isn't just about having more than one base Python installation. It's about having multiple installations which must have unique pip configurations. And about wanting virtual environments to inherit that configuration, which is demonstrably rare, because it's not pip's current behaviour (if everyone wanted inheritance, why has no-one raised this before?)

The other thing is that I think we should see this issue as the one surfacing rather common problem of not having a way to transparently pre-configure virtual environments so that their users could easily obtain working environment without following additional, tedious configuration steps which otherwise could be automatised and hidden.

Agreed, but that's an issue with virtual environments, not with pip. Surely you're just as likely to want to pre-configure your virtual environment for other tools as well? And as has already been noted, virtualenv has this facility with its plugin feature, so this is only a problem for the stdlib venv.

I guess people used to how things are in this regard

Possibly. Or possibly (most) people don't have a problem with the current situation. We currently have no way of knowing. Which is why I'd like to see more evidence that this is a general problem before changing pip's default behaviour.

pelson commented 3 years ago

This boils down to, pip looks at the currently active environment, the site configuration, and the user configuration. This is also simple to explain.

It might sound simple, but even the term site is ambiguous here (without even considering the literal meaning of "site"). You could mean the site module (std lib) which comes from the sys.base_prefix, or the site-packages directory which comes from the sys.prefix, or if the venv is created with --system-site-packages you could mean both sys.prefix and sys.base_prefix.

Correction: I've just re-read your statement, and now understand that the thing you call "site configuration" is what pip calls "global" configuration, and what you called "currently active environment" is actually the "site" config (in pip config terms).

My point here is not to nit-pick your words, but that "its simple to explain" isn't particularly true (there is a huge amount of detail in the global and user configuration), and it certainly isn't significantly worse if we have to include the following statement in the documentation: "for site config pip reads sys.prefix / pip.conf if it exists, and if not will fall back to sys.base_prefix / pip.confg if that exists, otherwise no site config will be used".


As far as I can see what remains of the discussion comes down to a judgement between the relative merits of one default behaviour vs another. As a reminder the major competing use cases:

vs

(note: I've genuinely tried to represent the use cases fairly, but given what I've written, I seem to have found it difficult. Please feel free to add or refine the use cases.)

I don't think I can provide any more information to help on this judgement without risking going around in circles, and we're already quite a long way into this conversation. If it is decided not to allow venvs to inherit the base config out-of-the box then I would consider it unfortunate for organisations such as universities and national labs, but ultimately perhaps they have the resources to work around it (by hacking pip in some way, or by enforcing virtualenv instead of venv and using its plugin system). Of course, if it is decided that my proposal is indeed a reasonable next step then I would be more than happy to polish the implementation in #9753.

If there are viable proposals for not changing the default, but making the behaviour configurable (non global, non user, non environmental) I'd also be happy to hear them.

Thanks to all for your time so far :+1:

pelson commented 3 years ago

A user of a multi-tenant machine (e.g. a supercomputer or an organisation's internal cluster), with multiple Python distributions requiring distinct pip configs

I realise now that I've not even included my own setup here. In my case I provide a Python distribution on a shared network mounted disk (NFS in this case) such that a diverse set of (Linux) machines and users can have the same Python distribution. In my scenario I literally cannot control the global configuration (I don't get mounted at /etc) and there are hundreds of users who I don't control and can't enforce user configuration.

pfmoore commented 3 years ago

I've just re-read your statement, and now understand that the thing you call "site configuration" is what pip calls "global" configuration

I just checked the code here, and the output of pip config debug, to confirm I'm not confusing things here. Global and site configuration are different and site configuration is held in sys.prefix. It looks like site config isn't mentioned in the documentation, which is possibly why we're having such a hard time understanding each other, but site configuration is a real thing.

It's still true that site configuration isn't inherited by virtual environments (because they have a different sys.prefix) but that's deliberate, as we've said.

pradyunsg commented 3 years ago

I'll encourage folks to use the vocabulary established in https://pip--9474.org.readthedocs.build/en/9474/explanations/configuration/ if you're talking about pip configuration files to avoid confusion here. (that's from a documentation rewrite for pip I'm doing)

pfmoore commented 3 years ago

It might be worth clarifying in those docs that the site location is {sys.prefix}/pip.ini, and applies to all environments, not just virtualenvs. The existing docs have the same problem.

pelson commented 2 years ago

I appreciate all of the effort that has gone into this issue so far. Unfortunately I think it has stalled a bit, possibly as a result of a switch of focus mid-discussion.

The key messages in https://github.com/pypa/pip/issues/9752#issuecomment-817510909 and https://github.com/pypa/pip/issues/9752#issuecomment-819025200 are:

pfmoore commented 2 years ago

I'm not sure it's stalled, so much as run its course.

My position is that this is absolutely a feature request, not a bug. And personally, I don't see a sufficiently good case having been made for the feature, so I am -1 on including it.

If someone wants to push for it, I think they'll need to do 2 things:

  1. Produce a PR, including documentation, so that we have concrete evidence for how much additional complexity this will add to pip, both in terms of code and in terms of describing pip's behaviour.
  2. Successfully argue the case that the benefits are sufficient to justify the additional complexity.

However, I would caution anyone investing time in the above, it's by no means a foregone conclusion that if you do the work, it will get accepted. At the moment, the only evidence we have is from one user, the OP, whose workflow would benefit from this (there have been other people, including me, expressing interest in the idea, but as far as I can tell that's all been theoretical, without actual use cases). We'd need the feature to be much more broadly useful (either in terms of examples of other people who would use this, or other reported issues that could be handled with this feature) if it's to be justified.

pelson commented 2 years ago

At the moment, the only evidence we have is from one user, the OP

I currently represent 400+ scientific and engineering Python users, and have quite extensive experience of deploying Python to large scientific organisations who regularly have network isolated (internet-free) machines with internal package indexes (HPCs, critical op infrastructure, etc.). I'm not saying I'm correct or my opinion is worth more as a result, but I'm genuinely doing this in the interest of a significant user population who you would otherwise never see as they are behind an organisational firewall.

  1. Produce a PR, including documentation

The PR I produced offered concrete evidence (https://github.com/pypa/pip/pull/9753). It is stale, but remains a useful metric of the additional complexity. Essentially, it doesn't introduce any new functions as there is already behaviour to pick from the first found config file from a list of files.

I probably need to update the docs to be consistent with the terminology refined by @pradyunsg, which I can do if there is a hint of my proposal being accepted.

  1. Successfully argue the case that the benefits are sufficient to justify the additional complexity.

I personally think the complexity is low (please feel free to leave comments on the complexity in the PR if you disagree).

The benefits are clearly harder to be convincing about (I've tried, and either I'm not being clear enough, or you've understood the use case and don't think it is a useful one). I have tried quite hard to explain the use case in this issue, but will reiterate the example as succinctly as possible:

_As a maintainer of a network mounted Python distribution for hundreds of scientific users in an isolated network environment, I want my users to be able to create virtual environments on their own machines (via venv, virtualenv, or some other method) which inherit the pip configuration from the base environment, so that my users can correctly pip install subsequent packages from the pre-configured package index into their virtual environment. It should not rely on env-vars (unreliable), and must not rely on a file existing in a user's homespace or on their machine's local dist (e.g. in /etc/)._

If this use-case is understood and is not going to be supported, I suggest we close the issue. Otherwise, I will be happy to revive the simple PR in order to see it over the line. Furthermore, I will be happy to make further contributions to improve docs around the config in particular, if that is welcome/desired. I have a long-term interest in the feature being maintained, and would happily be pinged in order to address/resolve any future complications/requests around the config topic.

pfmoore commented 2 years ago

I currently represent 400+ scientific and engineering Python users, and have quite extensive experience of deploying Python to large scientific organisations who regularly have network isolated (internet-free) machines with internal package indexes (HPCs, critical op infrastructure, etc.). I'm not saying I'm correct or my opinion is worth more as a result, but I'm genuinely doing this in the interest of a significant user population who you would otherwise never see as they are behind an organisational firewall.

I really appreciate the fact that you've taken the time on this issue to represent that user base. As you say, it's extremely hard for us to find out about such use cases and the difficulties they face, and the input is really important to us. Please don't take the pushback you're receiving on this issue as a lack of interest in your use case. In particular, one reason I'm personally so involved in this issue is because I want to make sure we don't forget about these "hidden" use cases (I don't otherwise have much opinion on pip's config mechanisms 😉).

One question I asked above, which hasn't yet been answered:

Do you have any examples of other software that looks up configuration from both sys.prefix and sys.base_prefix? I'm not aware of any.

This is a key point - pip shouldn't be breaking new ground here, the problem our config system is addressing isn't unique to pip, so we shouldn't be implementing unique solutions. So I would expect you'd have similar problems with other software, and if you don't, I'd want pip to do things similarly to those tools, for consistency across the ecosystem if nothing else.

The PR I produced offered concrete evidence (https://github.com/pypa/pip/pull/9753).

I apologise. I had somehow missed that you had provided this PR. It does indeed look OK to me, in terms of clearly explaining what you're proposing and demonstrating that the change is relatively straightforward at a technical level.

I personally think the complexity is low (please feel free to leave comments on the complexity in the PR if you disagree).

The amount of discussion and differing opinions here demonstrates (to me, at least) that this isn't the case. It may be simple in implementation terms, and maybe even in terms of explaining the technical implications ("you can now pick up config from sys.base_prefix as well") but in user terms, the fact that we're having this amount of discussion, suggests that calling the complexity "low" is missing a key point.

Regarding the chance of moving this forward, I still think it's mostly down to you. There have been a number of key questions/issues raised here. Maybe you feel that you have addressed them, but I think the lack of progress is essentially because the rest of the participants wouldn't agree...

You've said that the plugin mechanism for virtualenv solves the issue for that case, your problem is really only with the stdlib venv. That in itself limits the benefit of this change. And I'm not clear why, given that you control the Python installation, you don't patch the stdlib venv to fix the problem there. So what about this issue necessitates that it gets solved in pip? Is it just that you think the solution "fits better" in pip? If so, you haven't really made your point on that yet.

We currently have the following config locations in pip:

You're proposing changing "site" to mean something very different (per-virtualenv plus inherited config from the base environment, I think). Could you explain how you'd re-word the description of "site" (without making it significantly longer than the other two!) to say what you intend it to now mean?

In my view, what you're after sounds more like you want to change "global". If so, that's something we inherit from platformdirs (where it's confusingly called "site" 🙁), so you should be proposing that change with them. I don't think we'd want to add a special case just for pip here.

If you want to add an extra level, then why? You've never (as far as I can see) said what precisely you want to put in this "shared base environment config". I can't think of anything that I'd want to set that would depend on which Python installation I used to create my virtual environment. If I have both your shared environment on my PC, plus a locally installed Python that's the same version, why would it matter which one I used to create a virtualenv? That's actually the biggest fundamental question here - what is the actual problem you're trying to solve here? Not the solution or approach you'd like to take, but what (in end-user terms) is the underlying problem?

I hope this helps. Sorry I can't give you a simple "yes, this seems like a good addition" or "no, sorry, we're not going to accept this", but I really don't think you've given us enough information about your requirement to let us make such a decision yet. I know you've tried, but hopefully the above helps you understand where there's still gaps in our understanding.

pelson commented 2 years ago

Maybe you feel that you have addressed them, but I think the lack of progress is essentially because the rest of the participants wouldn't agree...

Thank you for being explicit. This is also my feeling, and I appreciate having a concrete list of those things to focus on.

Unfortunately addressing those things is going to result in a long message - so apologies in advance. I will keep it as short as I can whilst addressing all that you have put to me.

You've said that the plugin mechanism for virtualenv solves the issue for that case, your problem is really only with the stdlib venv

In fact, right now I support venv (by patching it), but not virtualenv. Whilst virtualenv has hook support, I don't think it is any more easy to do by default than venv - the functionality to change the default seeder without explicit opt-in doesn't (yet) exist.

This isn't really about the tool though - I ideally want to support any tool that can produce virtual environments and which follows the standard sys.base_prefix and sys.prefix concepts. I'm advocating that pip knows how to handle virtual environments, not that the tools know how to make virtual environments with pip appropriately configured. The difference is subtle, but I would like to be able to create a virtual environment without pip, run python -m ensurepip in the venv and have a suitably configured pip as per the base, for example.

Is it just that you think the solution "fits better" in pip? If so, you haven't really made your point on that yet.

To be explicit: Yes exactly. That, and the fact that the tool used to create the virtual env isn't the important part - there are multiple valid ways to produce a virtual environment (a standard library concept), and it shouldn't be necessary to patch all tools which create them in order to get a working pip.

Do you have any examples of other software that looks up configuration from both sys.prefix and sys.base_prefix? I'm not aware of any.

pip shouldn't be breaking new ground here, the problem our config system is addressing isn't unique to pip, so we shouldn't be implementing unique solutions

I empathise with this point, but I disagree - IMO pip is special in this regard. pip is a fundamental part of an isolated virtual environment's subsequent population. There is no other library or tool which fundamentally needs to be appropriately configured immediately upon venv creation in order to allow tools to subsequently extend the newly created venv. To put this another way: for every other tool it is possible to manually configure a tool before running it (even if not ideal), but pip is so fundamental that tools (such as build, pre-commit) bake into their design the idea of "create venv + install packages" as a single step - there is no opportunity for users to configure pip manually in-between.

Finding something that does config like this is going to be tough as a result. Perhaps the most similar behaviour, whilst not using virtual environments precisely, is found in conda. Their config search path follows:

But finding other projects that read their config from sys.prefix, never mind sys.base_prefix, is challenging. Often it is based on $XDG_HOME, or /etc.

You're proposing changing "site" to mean something very different

I accept this directly. Both platformdirs and site.py (std-lib) have similar naming issues, IMO. The term "site" to mean virtual environment doesn't make much sense to me, since you tend to provide a "site-wide" (base) Python installation, rather than a site-wide virtual environment - but I am happy to fall into line on the terminology :+1:.

Perhaps, given the conflation in many places, it is best to move away from the term "site" altogether. Python environment would be good enough to convey that it is coming from sys.prefix. I leave that to another discussion (on another issue) though. For now site is good enough to mean current Python environment.

In my view, what you're after sounds more like you want to change "global".

I'm explicitly not after this - I do not want to influence the pip.config of all Python invocations on a machine. I only want to influence those that use the Python installation I provide on a network drive - two reasons: the Python invocations aren't all happening on the same machine, and I don't want to touch what happens to other Python distributions running on that machine (platform managed Python, or other Python installations that may exist).

This latter point you've probed specifically about, so will talk about later on.

If you want to add an extra level, then why?

Given the naming confusion around site, I agree that I am fundamentally requesting a new level.

I think the motivation is coming through in my points above, but I believe that venv specific config should be relatively rare. I will try to highlight what I see as the advantages of each:

The new level that I would put in between Global and Site:

(FWIW, I would have put Environment variable last, proceeded by User, as that would have been more consistent with PYTHONPATH, and would maximise the utility of the benefits I highlighted - that is a different discussion though :wink:)

If I have both your shared environment on my PC, plus a locally installed Python that's the same version, why would it matter which one I used to create a virtualenv?

My shared environment doesn't support your arbitrary locally installed Python version. If you have problems with your Python version with my pip.conf, I am going to tell you to use my shared Python environment to see if you can reproduce it there. Concretely, my shared environment patches Python in certain ways, including overriding ensurepip - this ensures that when a new venv is created a reasonably up-to-date pip is installed out of the box. This was essential at a certain point in order to pick up the site config (which I think was added around pip 9), but now is a convenience and ensures that people are using the improved resolver and benefiting from all of the other improvements that come with modern pip.

I think your question is essentially asking: What is the use case for having multiple pip.config files on the same machine for the same Python version.

Hypothetically I could imaging wanting to differentiate the package index to use. For example, one could timestamp the index being used based on the Python distribution release date, in order to provide reproducible and snapshotted pip install calls - I don't do this, but it is something that my sys-admins and security team have requested, and a strategy that is employed for Linux distributions w. RPMs.

Perhaps a list of reasons that I can't use the existing config locations would be useful:

but what (in end-user terms) is the underlying problem?

Simply put, as a user:

If you are partially convinced, I can revive the PR to see what it actually entails to implement the extra level being discussed. If not, I think I've probably over-spent on getting a return on investment already :smile:. Either way, I appreciate that this is also very time consuming for you, and want to say thank you for taking the time to engage.

pfmoore commented 2 years ago

If you are partially convinced

I'm not convinced that your proposed solution is the right one, but I do see that you have a problem that isn't easily solved by existing pip functionality. But equally, I don't want to let you think that convincing me is all that's involved here. I'd like to see you get some feedback from other pip developers if we're to do anything with this. And assuming the other devs are receptive, I'd be wanting to explore options that don't have the problems that this approach has (which have been discussed at length above).

I hope that helps - I'll let you decide whether you want to pursue this further.

I might follow up with another response addressing some of your particular points. But please treat that as simply exploring the problem a little further - there's no need for you to respond to it.

pradyunsg commented 2 years ago

I've retitled this issue, to better reflect what is being asked for.

As useful context/background, I wrote pip config and wrote/refactored most of the configuration logic a few years ago (stating this to establish I'm reasonably familiar with it, and not to imply that I somehow "own" this piece within pip or have any sort of authority over this beyond what my fellow maintainers do).

I think the rationale provided here for adding a new level is reasonable, and I'm fine with adding an additional config file at base_prefix.

I will say this: this is a weird organisational situation you're in, where you can't/don't want to influence the global configuration (i.e. it's fine to have pip reach out to whatever is configured there, or the default PyPI) and you don't trust/want your users to configure their tools to work correctly (i.e. by documenting/educating about a standard config file for the use case). I'm not sure what pieces are specific choices by you vs being forced upon you. :)

Coming back: I don't think an additional config file layer is particularly that expensive for us in terms of the code + maintenance, and it feels like the right level of abstraction to solve this particular class of organisational/usage-pattern problems at -- the main concern I have is around (a) documenting this clearly+completely, and (b) a clean migration to this. Since it's effectively a new feature, (b) is less of a concern. (a) is something that will end up needing to be a part of the PR adding this. I'll guess that this is something that's mostly useful for organisational users, where there's some shared configuration for Python development that only applies to a subset of users (likely the package index?).

I'd be happy to review and (assuming no other maintainers are strongly opposed) merge a PR for this.

To err on the side of overcommunicating.... I'll explicitly state that I'm not promising that I'd take a look at such a PR quickly though -- trying to juggle quite a few things at the moment. I also haven't been looking at this thread closely due to limited bandwidth on my end. I've quickly skimmed through the initial bits of this thread and the most recent comments (sorry, there's way too much text here and I have limited mental energy right now) and that's the context I had when I wrote this. Sorry if I've missed some nuances/points that have been brought up already. 🙈

pradyunsg commented 2 years ago

I'll tentatively mark this as awaiting PR, to reflect that we'd review a PR for this and that's the next step forward here. I'll also note that I likely won't have the bandwidth to actually write this PR but likely will for reviewing one.

To my fellow maintainers, please feel free to remove that label if you think I applied that too eagerly. :)

pelson commented 1 year ago

We can close this out thanks to #11487 having been merged :tada:. Should be in pip 23.1 when it is released.

Thanks to all who contributed to the discussion. It was a long thread, over a long time span, but I believe we came out of it with a very worthwhile improvement, particularly for the case where a common base environment is supplied (e.g. by sys-admins) and where virtual environments are expected to be correctly configured from the base.

I personally will be happy to delete some ugly workarounds as a result of this :smile:

pfmoore commented 1 year ago

This change caused issues in the 23.1 release (see #11982). As a result it is being reverted and a new PR will need to be submitted to re-implement the functionality, with a new design.

The design here was explicitly noted (above) as being backward incompatible, and indeed it was this incompatibility that caused the issues. Any replacement PR will need to either be fully backward compatible, or will need to have a suitable transition plan, including a deprecation period for the existing behaviour and a plan for what we will do if further compatibility issues occur.

Furthermore, this design breaks the isolation between virtual environments and their "base" environment - again, something that was noted in the discussion but which wasn't really addressed as a potential issue. Any replacement proposal needs to avoid doing this somehow. IMO, pip should not break the "virtual environments are isolated from their base environment" principle.

On a personal note, as a result of the issues, I'm now even less convinced that this is a feature that pip should accept. I understand there's a use case here, but there are other ways of addressing it. I'm very aware that "outsiders explaining how you can solve your problem" is generally more annoying than useful, but I would think that the following are possibilities to consider:

  1. A wrapper for creating virtual environments that populates the environment's pip.ini appropriately. The venv module is explicitly designed for this sort of extension. Yes, I know that one constraint is that the site can't control how users create environmnts, but "do it this way or we can't support you" is often a reasonable compromise.
  2. A global "Python startup" hook (implemented using one of the core Python mechanisms for such a thing) that checks the environment for a pip.ini and if there isn't one present, creates it. Again, this is using supported methods for customising your installation and therefore should not be a problem.

To be clear, my reason for preferring a solution using existing mechanisms is that if you take that route, you don't have to consider all of the other ways in which people can use (or abuse!) the existing mechanisms, and so you can implement a simpler, dedicated solution that doesn't have to take an "extreme defensiveness" position - which is what a new feature in pip has to take.

pelson commented 1 year ago

Firstly, thanks to @pfmoore for handling the fallout from this. I appreciate that it has caused grief and a lot of expended effort, so thank you.

my reason for preferring a solution using existing mechanisms is that if you take that route, you don't have to consider all of the other ways in which people can use (or abuse!) the existing mechanisms

To re-iterate what we already discussed (in admittedly very long thread above), I do have local solutions and am proposing this into pip because I believe it is a valuable requirement for providers of Python distributions who aren't shipping as a system Python.

The truth is: this situation is unfortunate and was not practically foreseeable. I believe the change that was introduced was reasonable given all that we knew, and to be entirely frank, I consider the fundamental problem here is that the Windows Store builds are configuring pip in such a way (I would be happy to elaborate if it is considered a contentious statement). That being said, we now know more than we did before, and that may influence our design going forwards.

In practice, the workaround to this issue is quite simple (you documented a number of them yourself - simply setting an environment variable, for example, addresses the problem). Would suitable and specific documentation of this breaking change, plus a transition guide not be considered sufficient in this case? ("if you wish to use Windows Store Python with pip and venv, you must configure pip in your virtual environment, else it has been pre-configured to install into --user").

Any replacement PR will need to either be fully backward compatible, or will need to have a suitable transition plan, including a deprecation period for the existing behaviour

The deprecation period would presumably warn if a base pip config file is detected, and proceed to ignore it until a future release of pip? As well as a future flag to acknowledge the ignore behaviour, and enable the new. This is something I could buy-in to, and would be willing to proceed with, if it is agreed that this is the desired approach.

pfmoore commented 1 year ago

The deprecation period would presumably warn if a base pip config file is detected, and proceed to ignore it until a future release of pip? As well as a future flag to acknowledge the ignore behaviour, and enable the new. This is something I could buy-in to, and would be willing to proceed with, if it is agreed that this is the desired approach.

No, that's not sufficient (IMO). While I agree that the Windows Store distribution's approach is problematic, it is a real-world situation we have to deal with. As such, the problem is that the end user has no way of addressing any such deprecation warning - they can't remove the "base" config file, nor can they be expected to switch Python distributions.

If you can persuade the Windows Store distribution to change their behaviour (see https://github.com/python/cpython/issues/103646) then when they have done so and there's a reasonable expectation that the majority of their users have upgraded to a version that no longer has the config file, then we could try again with this approach. But that will be a long time happening, and even then I still object to the idea of base environment configuration "leaking" into derived virtual environments that don't have --use-system-site set.

I think what's needed here is a different approach that solves your use case without using an existing config location. And it should probably also block setting configuration values (like --user) that don't behave the same in "system" and virtual environments. Or something. Remember that I'm the person who has been consistently uncomfortable with this feature - I never approved it, I simply stopped objecting in the face of other pip maintainers being willing to support it. I'm definitely not the person you want to be advising on how it should be designed. I'm the person who will pick apart any design you propose 😉

dalebrydon commented 1 year ago

I think I tend to agree with @pfmoore, that this is a bit of a confusing change to me. I admit I never fully grasped why changes to other tools aren't the preferred method. Wouldn't something like a --copy-pip-config option in venv or virtualenv work too?

The big thing I don't like with the current design of pip config is that putting things in the Python install is treated as site config when not working in a virtualenv. To me things in the Python install are clearly global config but sort of get stuck as project-level config because we use sys.site for this. Note that even the comments in pip source refer to the site configuration as virtualenv config, so I think this is an accidental side effect more than intentional design. The base thing only extends this problem, which is why I would prefer some other solution.