pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

Be able to use different proxies for different index urls #8232

Open stefanoborini opened 4 years ago

stefanoborini commented 4 years ago

What's the problem this feature will solve?

Possibly this is already known but I could not find an easy solution. Currently pip does support the use of proxies, but as far as I understand it will only rely on one proxy, and use it for every access. The following scenario would therefore introduce problems:

Now, when the configuration is done for the browser, this is not a problem, as the automatic setup uses a proxy.pac script that is able to use logic that maps the requested URL to the proxy server to use (or when not to use it). pip however cannot take advantage of this, and if you have the above situation, you can either connect to the local pypi but not the external one (without proxy setup) or only to the external one but not the local one (with proxy setup)

Describe the solution you'd like

pip already supports a --proxy option. The following solutions could be implemented:

  1. extend the grammar of this option to support multiple entries that are mapped to index and extra index, in order.
  2. consider --proxy as the default and to extend the --index-url or --extra-index-url so that both the url and the optional proxy can be passed in some form
  3. add specific proxy options --index-proxy-url and --extra-index-proxy-url. Given that it's possible to specify multiple --extra-index-url entries, proper mapping must be done.

Alternative Solutions

Unsure. It might be that there's a smarter solution that does not involve pip.

Additional context

I am unsure how the proxy setup scenario given above is prevalent out there. I am sure however that at the moment I am experiencing it.

cjc7373 commented 4 years ago

Here's a workaround without involving pip. There are such tools like clash that can help you do exactly what you want. You can check out its repo for more information, especially the rules part.

sethraymond commented 4 years ago

I have this exact same need with my work setup. Installing third-party tools isn't really a viable solution for us, so having this capability baked-in would be fantastic!

uranusjr commented 4 years ago

To be honest, I don’t really see the need of a pip feature here, or even a third-party tool. This can be easily automated with a script, each call installing different packages from different indexes, setting the appropriate proxy for each call.

sethraymond commented 4 years ago

Writing a custom script isn't a particularly elegant solution, IMO. Rather, I think having the --proxy option for requirements files is an intuitive solution. Let the proxy be scoped to just the file it's introduced in. For example:

requirements_proxy.txt - this file might look to the official PyPi server, and needs the proxy

--proxy <myproxy>
<package1>
<package2>

requirements_no_proxy.txt - this file is behind the company firewall, so we don't use the proxy

--index-url <local_pypi>
<package3>
<package4>

requirements.txt

-r requirements_no_proxy.txt
-r requirements_proxy.txt

Now, we simply run pip install -r requirements.txt.

stefanoborini commented 4 years ago

To be honest, I don’t really see the need of a pip feature here, or even a third-party tool. This can be easily automated with a script, each call installing different packages from different indexes, setting the appropriate proxy for each call.

You can't do that if you rely on pip resolving dependencies across indexes, and the indexes need different proxies. This is typically the case when one of your internal dependencies A deployed in your internal corporate pypi depends on a package B that is on pypi. The invocation to retrieve A will try to retrieve B, but pypi can't be reached because the proxy setting are configured to reach your internal server, and you can't switch the server middleway though the resolve and retrieve cycle.

pfmoore commented 4 years ago

The stated scenario can be handled using the https_proxy and no_proxy environment variables, I believe (put the internal locations in no_proxy). That should be sufficient - it may be a little clumsy, but honestly, that's a consequence of the company proxying policy. And you can probably write someothing that parses the pac file to automate some of this process.

I consider this to be too much of a niche case to be worth supporting in pip. If anything, I would rather see pip simplify its current network handling options, maybe delegating the complexity of handling proxies, etc, to a library (either requests, or a wrapper around requests) that can be configured via a library-defined config file. Then pip can simply reuse this. All of this sort of network configuration could then be handled by that library.

Adding yet more complexity to pip's network support simply makes it harder for the wider ecosystem to grow, because "only pip supports my network setup" becomes a blocker for the development of other tools.

stefanoborini commented 4 years ago

The stated scenario can be handled using the https_proxy and no_proxy environment variables, I believe (put the internal locations in no_proxy). That should be sufficient - it may be a little clumsy

You can't. When you start pip, it starts connecting with whatever has been passed. You can't change the environment variable while pip is executing, and especially you can't change it according to where it is connecting. This is not process spawning works. Period.

but honestly, that's a consequence of the company proxying policy. And you can probably write someothing that parses the pac file to automate some of this process.

Are you suggesting that I go out to a fortune 500 company and force them to change their proxy policy just because I can't use pip to manage to switch proxy while executing?

I consider this to be too much of a niche case to be worth supporting in pip. If anything, I would rather see pip simplify its current network handling options, maybe delegating the complexity of handling proxies, etc, to a library (either requests, or a wrapper around requests) that can be configured via a library-defined config file. Then pip can simply reuse this

This is already the case, pip must provide the interface to define this. Otherwise what are you suggesting? that I monkeypatch the code?

Adding yet more complexity to pip's network support simply makes it harder for the wider ecosystem to grow, because "only pip supports my network setup" becomes a blocker for the development of other tools.

Pip does not work appropriately in a corporate environment. I'd say it's a pretty major issue.

uranusjr commented 4 years ago

Are you suggesting that I go out to a fortune 500 company and force them to change their proxy policy just because I can't use pip to manage to switch proxy while executing?

I’d suggest going out and recommend a Fortune 500 company to offer a little resource and develop a solution (e.g. what @pfmoore suggested), rather than relying on a volunteer-run freebie project to solve the problem for you.

Sorry for being blunt, but surely you can see a bit of irony in the situation here 🙂

MarcoGorelli commented 4 years ago

pip must provide the interface to define this

With all due respect, what makes you think you can demand free work from others?


Sorry to interject here, but I really appreciate the work of all you maintainers/contributors in this project :) It pains me to see you addressed this way. This is all I'll write here, I don't want to start a flame war

stefanoborini commented 4 years ago

@MarcoGorelli

Sorry to interject here, but I really appreciate the work of all you maintainers/contributors in this project :) It pains me to see you addressed this way. This is all I'll write here, I don't want to start a flame war

as in MUST RFC 2119:

  1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

What I am opening is a feature request. The feature is currently not present, and is a requirement in corporate environment. The proposed solution does not address the issue because pip does not provide an interface to configure the underlying requests call to specify a different proxy depending on the URL of the connection. The opened issue proposes possible solutions, but it is not up to me to decide how the implementation should be done.

If opening an issue is to demand work from others, then we should probably close github issue tracker. I am not demanding to implement this right now. I just reported an issue and said that this is a requirement in corporate environments in which current proposed workarounds are not acceptable. Then if they want to do it, fine. If they don't want to do it, peace. I'd do it myself if I had the time, competence in the code and my contract allowed me to do so.

pradyunsg commented 4 years ago

I've labelled this issue as an "deferred PR".

This label is essentially for indicating that further discussion related to this issue should be deferred until someone comes around to make a PR. This does not mean that the said PR would be accepted - it has not been determined whether this is a useful change to pip and that decision has been deferred until the PR is made.

niderhoff commented 3 years ago

The stated scenario can be handled using the https_proxy and no_proxy environment variables, I believe (put the internal locations in no_proxy). That should be sufficient - it may be a little clumsy

You can't. When you start pip, it starts connecting with whatever has been passed. You can't change the environment variable while pip is executing, and especially you can't change it according to where it is connecting. This is not process spawning works. Period.

@stefanoborini I think you may have misunderstood what they were trying to convey? In our case it sufficed to put the hostname of the private pypi repository that didn't need the proxy into the no_proxy env var. Pip will then connect to pypi through the proxy and to our private repository without a proxy. Is that not an adequate solution?

export https_proxy=company.proxy.ip
export no_proxy=devops.interal.repo.url

just note that pip does not interpret wildcards in the no_proxy variable.