pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.4k stars 2.99k forks source link

Allow --no-binary and --only-binary per index #11071

Open f3flight opened 2 years ago

f3flight commented 2 years ago

What's the problem this feature will solve?

In our environment we prefer to consume only source artifacts from the outside world, but not binaries (for ex. wheels). Currently we have to do funky stuff like pip download --no-binary :all: ... + some code to replace downloaded sdists with internally available wheels if version matches + we have a problem with build dependencies and setup dependencies.

Describe the solution you'd like

Add support for a new token :index:<http...> in --no-binary and --only-binary handlers. <http...> is a URL which will be compared against Link.comes_from in LinkEvaluator (where most other binary vs source logic is also located).

Example use case:

pip install blah --extra-index-url https://my-trusted-wheels-source.local --no-binary ":index:https://pypi.org"

Result:

Alternative Solutions

Our team was not able to find a reliable workaround for this scenario.

Additional context

N/A

Code of Conduct

pradyunsg commented 2 years ago

Hi @f3flight!

I'd like to avoid adding extremely usecase-specific index-related functionality to pip directly.

Various organisations have differing needs for what they consider "safe" and I believe that pip can't grow options+complexity to accomodate all possible definitions. To that end, I'd recommend investigating if you can use some index server implementation that filters items in this manner for you -- that would push the logic of dealing with the nuances of your usecase outside of pip, avoiding additional maintainance load on pip while also giving you direct control over what pip would see (eg: allowing you to evolve or finetune the definition you use -- eg: allowing pure-Python wheels but not compiled wheels).

There are external-to-pip pieces of software, that provide the ability to carefully control and curate how packages can be discovered (eg: https://github.com/uranusjr/simpleindex with its custom routes). These are both interoperable with other tools that understand Python packaging standards and allow for much finer grain control over what is provided on the index pages. Typically, this takes the form of having a fairly-dumb index server that composes the other indexes' HTML, and only presenting the files that you consider "safe" to use in your environment (for whatever definition of safe you have).

PS: As a generic note, I'd appreciate if you avoid mentioning the issue in the commit message. It tends to create a lot of spam on the issue timeline, since each time you amend or rebase the commit, it shows up as an event.

f3flight commented 2 years ago

Hi @pradyunsg, @pfmoore! Thanks for the detailed responses! Apologies for the spam, wasn't aware that putting issue in commit is bad practice.

I think your concern is reasonable. The proposed workaround makes sense. The approach of spinning up and maintaining an internal service is considerably higher cost for our team then a pure client-side solution, which is already done (#11072). My team has not considered server-side workarounds, we were focusing on client-side only. Of course, the client side implementation is maintenance cost for pip team. And I understand that this feature request, if accepted, may open doors to more index-specific feature requests and pip team thinks it's not worth it.

I'll need to discuss with my team on where to go from here.

notatallshaw commented 2 years ago

A little input from someone who isn't OP but works in an organization with multiple indexes and thinks about appropriate Pip configuration.

I would say if most organizations really thought about appropriate configuration for indexes they would realize that there should indeed be slightly different options between public and private indexes. Whether it's source type or one of the many other options.

If I were to design Pips config options from scratch I would have each source a package could come from as it's own section in Pips config, e.g. [sources] some_global_option=True, [sources.file] cache=False, ..., [sources.pypi_index] url=... proxy=... . [sources.company_foo_index] url=... etc...

This would allow a much more generic customizable approach rather than trying to add each option 1 at a time. However I'm not sure how I would propose such a feature to Pip as it would take a design work and a migration plan. But maybe Pip maintainers feel differently about it one way or another.

pfmoore commented 2 years ago

This would allow a much more generic customizable approach rather than trying to add each option 1 at a time. However I'm not sure how I would propose such a feature to Pip as it would take a design work and a migration plan. But maybe Pip maintainers feel differently about it one way or another.

Personally, I'm still not interested in this feature, even using an approach like this. That's not to say, though, that I would object to it being in pip, simply that I have no interest in implementing it myself, and I don't find the arguments for such a feature compelling.

If someone were to put together a concrete proposal for implementing something like this in pip (including an offer to do the implementation work, either themselves or by funding someone to do it) then I'd want to see the resulting design provide benefits for pip's architecture, as well as simply addressing the end user need. To put that another way, I'd want to see the changes result in an overall simpler and more maintainable configuration architecture, as opposed to simply layering complexity on what's already there. I think this is what you're suggesting, but I'd want to be explicit about it.

Basically, I think we have to be careful here of pip's technical debt. We cannot sustainably keep adding features without paying off that debt, and we've frankly not had a lot of success finding time and resources to create pure "debt reduction" PRs. So I'd like to see feature requests like this, which are of interest only to a very small part of pip's user base, "pay their way" by contributing to the clean-up of affected parts of pip's code base.

I'm still not in favour of this change, as I think it's better for pip, its users and the ecosystem as a whole if we encourage more use of tools working together rather than loading everything into pip (or any other "does everything" tool). But if someone does want to move forward with this, then these are the criteria I'd apply in deciding whether to support it.

notatallshaw commented 2 years ago

Personally, I'm still not interested in this feature, even using an approach like this.

I assume because you don't have to handle multiple indexes and manage their configuration right now? 😉

I agree a proposal should simplify Pip's architecture and reduce technical debt, and anyone providing should fund or provide resources to do it. But it had been expressed that it was just OP wanting this, so I thought I would chime in that it's definitely not the case.

And IMO Pip is likely to see more requests for specific features to be made per-index configurable as users examine their configuration more closely. This was an approach to help solve that problem I've been mulling on recently that I think if implemented right would help reduce tech debt, but I don't have the resources currently to contribute it to Pip.

pfmoore commented 2 years ago

I assume because you don't have to handle multiple indexes and manage their configuration right now?

Because I don't right now, and if (and when) I do, I'd rather do so using a custom index that unifies the data I want to make available as noted. Yes, I am aware that maintaining such a private index is extra work, and no, it's not something I've done myself. But the people who want this are apparently trying to work with both PyPI and a private index, so they are already managing one private index - adding another doesn't seem like an impossible ask.

Please don't assume that I'm against doing this simply because it doesn't affect me. I already said that isn't the case.

And IMO Pip is likely to see more requests for specific features to be made per-index configurable as users examine their configuration more closely.

All the more reason I'd prefer to standardise on pip supporting a single index well, and let index multiplexers handle the problem of combining indexes. That way, users can easily get per-index filtering, prioritisation between indexes, blacklisting and other supply chain mitigations, etc etc, without expecting the pip team to maintain all of these features.

Of course, if the reality here is that no-one else wants to, or has the time to, build or maintain these features, and this is simply a case of people hoping that the pip maintainers will do the job for them, then I guess this discussion isn't likely to go anywhere...

notatallshaw commented 2 years ago

All the more reason I'd prefer to standardise on pip supporting a single index well, and let index multiplexers handle the problem of combining indexes.

Yeah fair enough to say this problem can be solved outside Pip. I was coming from the perspective that given pip supports multiple indexes currently it makes sense to support multiple indexes well, such as allowing indexes to have their own configuration.

I wonder if it makes sense then, at least eventually, to deprecate Pip's support for multiple indexes?

pfmoore commented 2 years ago

I wonder if it makes sense then, at least eventually, to deprecate Pip's support for multiple indexes?

That would be my long-term preference, but practicality (and backward compatibility) may beat purity here. It is also possible to say "we support multiple indexes but only in simple cases" - but setting a clear boundary is hard for that, as we see here 😉