Open dstufft opened 1 year ago
I'm posting this here and to discuss.p.o just so nobody misses it.
It's been about 10 days since I posted my proposal and other than a few questions I haven't seen anyone raise a objection to the overall idea, and previously folks had seemed on board with the idea (the longer proposal designed to make sure everyone was on the same page and to make it easier for people to jump in without having to read both threads in their entirety).
Given nobody has objected, I'm going to take that as a sign that it's worthwhile to take this to a PEP, so I'll go ahead and start working on that. I plan to focus that PEP around the changes to the repository protocol and what those implications are for installers, I will likely include a non normative recommendations for installers that provide some high level guidance to installers to match the rough behavior in the proposal though, but I won't spell out specific UX for installers.
I am a big fan of:
Provide an option to map a specific package to a specific repository.
which is essentially what the index_lookup patch that I provided in that other PR does (I since have extended it slightly within pipenv) but essentially, I want the ability to say in a resolve and in an install phase to pull packages from specific indexes. I believe extending it the requirements.txt where each package could supply its own index line that is then used to buildup the index_lookup to pass into search_scopes could solve this problem and make the code from the referenced PR actually usable by pip's public interface which if I recall was the primary objection when I first opened that change.
Sorry, I'll have to do more reading to catch up on everyone's position here, but that is my two cents I wanted to share since pipenv is currently using that patch, and we want to get to a point where we aren't patching pip.
What's the problem this feature will solve?
There's a long standing class of attacks that are typically called "dependency confusion" attacks, which roughly boil down to an individual expected to get package A, but instead they got B. In Python, this almost always happens due to the end user having configured multiple repositories, where they expect package A to come from repository X, but someone is able to publish something named package A at repository Y as well.
Traditionally this takes the form that someone has a private repository for only internal packages, but they also want to use PyPI as a fallback for anything that comes from the wider ecosystem, then someone comes along and registers one or more of their internal packages on PyPI and publishes their own code to it. This causes pip to effectively "merge" these two repositories and view them both as equally authoritative on package A.
Describe the solution you'd like
A key thing to notice here, is that dependency confusion depends on project A being expected to come from repository X, but really it ends up coming from repository Y, which almost always means that pip sees that A coming from both X and Y.
Thus, I suggest we "solve" dependency confusion attacks, and have pip, prior to doing any other filtering like for wheel compatibility, etc, determine if the collected links for a particular project only come from a single repository or if they come from multiple, and IF it's discovered links from multiple repositories, then it would generate an error and refuse to proceed.
Note: It may make sense to de-duplicate the URLs in cases where the URLs have a a hash, and the hashes match between multiple repositories, so that files where the exact same files exist on all repositories are still OK, it's just cases where they have different files.
We may also want some way to indicate that a particular package should opt out of this, or to target a specific repository for a specific packages, but maybe we don't? I can think of a few options:
Obviously we would need to phase this in over time, presumably by having it generate warnings at first that you can upgrade to errors, then errors that you can downgrade to warnings, then finally only errors (sans any choices we pick to allow people to select the repository they want to use for a specific package).
Alternative Solutions
Additional context
This idea came out of the Proposal: Preventing dependency confusion attacks with the map file thread on discuss.p.o, while a lot of discussion happened there talking about different strategies that someone could use to protect themselves from dependency confusion attacks. In that discussion, it occurred to me that all of these strategies require the end user to opt in to the protection, but ideally we want something that can happen by default, thus it dawned on me that the core problem comes from pip effectively merging two repositories... so pip could just not do that.
Code of Conduct