Open woodruffw opened 5 months ago
Thanks for putting together this proposal @woodruffw! Wanted to add some more evidence that such a plugin system would be used. A while ago I created a tool for generating Software Bill-of-Materials documents from Python environments/requirements files using existing (and proposed) PEP standards. Since Python packages + SBOMs is one area I'm focusing on in this upcoming year it would be great to provide a more native-feeling experience as a pip extension :)
I remain sympathetic in principle to having a plugin API for pip. And I think we should acknowledge the reality that even though we have consistently stated that people must not rely on pip's internal API, nevertheless people do, and the sky hasn't (yet) fallen. But that doesn't mean we're now going to support using pip as a library, or be comfortable adding features that suggest we will.
The proposal for pip ext
commands seems relatively harmless, insofar as it doesn't offer anything that isn't just as possible using a standalone Python script. I'm not really sure I see the attraction of injecting such a script into the pip command namespace, particularly if it has to be in the form pip ext name
(which is frankly a little verbose), rather than just running it directly - but if people want to do so, I guess I don't see the problem.
The main plugin proposal feels a bit light on detail. Defining an entry point namespace is fine, but unless there's some contract that defines at least how and when pip calls registered plugins, it doesn't say anything useful. And any such contract is, at some level, a guaranteed API that pip provides. So whether the plugin proposal is acceptable depends entirely on what that contract is. I'm willing to be open-minded about the possibility of being able to define something acceptable, but we've been around this loop many times, so I want to know what's different about this proposal.
The proposal for pip ext commands seems relatively harmless, insofar as it doesn't offer anything that isn't just as possible using a standalone Python script. I'm not really sure I see the attraction of injecting such a script into the pip command namespace, particularly if it has to be in the form pip ext name (which is frankly a little verbose), rather than just running it directly - but if people want to do so, I guess I don't see the problem.
I think the strongest arguments here are consistency and discovery: pip ext <...>
gives the surrounding tooling ecosystem a way to minimize the number of things they throw on the user's $PATH
, and automatically facilitates discovery (if pip ext --list
or similar is deemed acceptable).
In terms of verbosity: my original thought was to propose it without the ext
interstitial, i.e. someone could register pip frobulate
directly. That could be implemented such that it would never shadow any built-in pip
subcommand (present or future), but it's harder to teach users when such shadowing occurs. So I erred towards pip ext
as a slightly longer but always unambiguous option π
The main plugin proposal feels a bit light on detail. Defining an entry point namespace is fine, but unless there's some contract that defines at least how and when pip calls registered plugins, it doesn't say anything useful. And any such contract is, at some level, a guaranteed API that pip provides.
This sounds like the right opportunity for me to go into detail, then!
To make things more concrete, here's a kind of plugin that I could imagine: something that allows distributions to be introspected at various points before they're unarchived. Here's an example signature set for that plugin:
def plugin_type() -> PluginType:
# name subject to discussion!
return "dist-inspector"
def pre_download(url: str) -> None:
# contract: `pre_download` raises `ValueError` to terminate
# the operation that intends to download `url`
pass
def pre_extract(dist: Path) -> None:
# contract: `pre_extract` raises `ValueError` to terminate
# the operation that intends to unarchive `dist`
pass
when registered, pip
should enable this plugin on operations that download distributions, e.g. pip download
and pip install
.
This plugin is a little bit contrived, but demonstrates that useful things can be achieved while not committing pip
to much besides the API of the plugin itself:
pip
uses them;dist
(There are pieces of this that would need to be hammered out, e.g. how caching is handled, and whether every download is trapped by the plugin, or only ones that follow completed candidate selection. But I think it demonstrates a workable approach that can be used to conservatively carve out very small stable API commitments.)
To make things more concrete, here's a kind of plugin that I could imagine
Right. But that's just one example. And every example comes with a requirement for pip to add the infrastructure to call the plugin at the right time(s). So the proposal needs to define all the allowed plugin types. And that's basically a programmatic API. For example, does the pre_download
hook have to be thread safe? If the answer is "yes", then plugins need to be more complex. If the answer is "no", then pip can't parallelise downloads without adding extra support for serialising plugin calls. Either way, there's a bigger contract than just "pip calls the plugin".
But I think it demonstrates a workable approach that can be used to conservatively carve out very small stable API commitments.
Yes, this is the problem. In effect, the plugin API becomes a "stealth" attempt to get pip to commit to API stability guarantees. And that's what we've been against doing from the start. There are a bunch of reasons for this:
Remember, pip is an application, not a library - and that's a choice, not an accident. Even applications which have a plugin interface don't let plugins dig around in the application internals - instead, they provide a carefully designed and controlled API that plugins can use. And pip doesn't have such an API.
Let's give a really simple example. How can a plugin print a message to the user? It can't use raw print
calls or access sys.stdout
, because that will disrupt pip's output (progress bars, logging, handling of verbosity levels, etc). So what do we do? Provide a print
function that plugins can use safely? That function will need writing, testing, documenting, and supporting.
Yes, there are pip internal routines for all of this. And yes, as I said above, people are using pip's internals and the sky hasn't fallen. But they know we don't support what they are doing, and they accept that. Once plugin support is added to pip, how do we ensure that message stays clear? Because we can't simply close every bug that says "I have a plugin..." saying we don't support plugins (which is what we do now when people say "I am importing pip...").
So I think this leaves us back at my previous comment - for this proposal to be considered, you'll need to specify what plugin types you're suggesting, and what the contract of each one is. And given that there are bound to be requests for additional hooks, how do we limit the scope up front so that maintainers aren't faced with an ongoing job of rejecting proposals for "just one more small hook"?
Also, and I'm somewhat surprised it's me having to ask this, what are the security implications here? Installing a wheel is currently (relatively) safe, because it doesn't run arbitrary code from the internet. But I'm pretty sure that if we had plugins I could write a wheel that installed a pip plugin which (1) did malicious stuff whenever pip is run, and (2) hijacked pip uninstall
so that uninstalling the problematic wheel didn't remove the plugin. Couple that with the fact that people still run sudo pip
, in spite of all the warnings, and the hackers just won.
But that's just one example. And every example comes with a requirement for pip to add the infrastructure to call the plugin at the right time(s). So the proposal needs to define all the allowed plugin types.
I might not be understanding, but I don't follow why this would be the case, for two reasons:
pip v123
could say that plugin types dist-inspector
and frobulator
are stabilized, and plugin authors for those types could set pip >= 123
as a constraint in their plugin's metadata. Similarly, for deprecations, pip
could use its current 6 month deprecation policy to issue a warning on deprecated plugin types, and then produce a hard error (or silently ignore) those types after deprecation is completed.--use-feature
flag as part of an initial phase. IIUC this means that it wouldn't impact ordinary users or impose any sort of permanent API commitment on pip
's part, at least until it leaves the experimental feature phase. And at that point (1) kicks in.For example, does the pre_download hook have to be thread safe? If the answer is "yes", then plugins need to be more complex. If the answer is "no", then pip can't parallelise downloads without adding extra support for serialising plugin calls. Either way, there's a bigger contract than just "pip calls the plugin".
Sorry if my example didn't make this clear: my thinking for these hooks is that their execution contract is "pure" -- they have no side effects from pip
's perspective, and their only way to signal anything to pip
itself is to raise a ValueError
. This should be suitable for any future multi-threading refactor, and keeps the contract to a bare minimum (per your larger concerns about what pip
commits to).
(This brings up a good design point however, which is that loading these plugins probably shouldn't be unidirectional, i.e. shouldn't be triggered just by pip install some-plugin
also installing an entry point, so that any plugin that does violate this contract can't do so by default, and can be immediately disabled if it misbehaves. That has salience as a security feature as well, so I'll elaborate below.)
Let's give a really simple example. How can a plugin print a message to the user? It can't use raw print calls or access sys.stdout, because that will disrupt pip's output (progress bars, logging, handling of verbosity levels, etc). So what do we do? Provide a print function that plugins can use safely? That function will need writing, testing, documenting, and supporting.
I think this is a good example of what a pip
plugin should never do, i.e. should be explicitly excluded by the API contract :slightly_smiling_face:
(We're talking Python of course, so there's no formal enforcement of a plugin's contract. But someone who violates the plugin contract in this way is categorically indistinguishable from people who already violate pip
's contracts around not having public APIs.)
And given that there are bound to be requests for additional hooks, how do we limit the scope up front so that maintainers aren't faced with an ongoing job of rejecting proposals for "just one more small hook"?
Thank you for calling this out. I would be happy to discuss this further as well, but per the points above about an execution contract, here are some fundamentals that I think would serve you as the pip
maintainers well:
dist-inspector
as an example: pip-compile
, pip-audit
, and @sethmlarson's pip-sbom
are all extant independent tools that could make use of such a plugin.pip
process that executes them, except for explicit exceptional control flow that preserves pip
's ability to thread or parallelize its internals.pip
makes no guarantee about the "reachability" of plugins except where explicitly documented. Using dist-inspector
as an example: pip
might choose to only guarantee that uncached URLs retrieved during collection via pip install
trigger the plugin's hooks.pip
reserves the right to refuse to load plugins at all when it believes its being run from a non-recommended state, e.g. sudo pip
or user/global sites when marked with PEP 668.I would be happy to flesh these out more as well, and of course to contribute them to your developer and user docs as part of the implementation effort.
Also, and I'm somewhat surprised it's me having to ask this, what are the security implications here? Installing a wheel is currently (relatively) safe, because it doesn't run arbitrary code from the internet.
Thank you for calling this out as well! @sethmlarson and I discussed this a bit, and had a couple of thoughts on it:
pip install
(or even downloading with pip download
) is by default equivalent to executing it, since users don't generally restrict themselves to wheel-only installs. This isn't ideal, but it represents no worse of a security state than Python packaging is already in (especially since all bets are off once you execute any third-party code -- someone could just as easily rewrite or noop-out pip uninstall
from setup.py
or any other convenient vantage point).pip
plugins are loaded "bidirectionally": the user both has to do pip install sampleproject[plugin]
and explicitly enable plugins. For example, even after the --use-feature
phase, pip
could have a --enable-plugins
or similar flag (and conf setting) that's required to actually load the plugin.I think (2) is probably preferable, at the cost of a slightly less smooth UX. But I'll note that the Python packaging model is fundamentally brittle from a security perspective, regardless of what we do here: the game is "over" as soon as an sdist is allowed to run arbitrary code, meaning that even bidirectional controls can be circumvented (e.g. by the sdist setting the config itself).
So this is really just card shuffling to some degree, unless plugins could be identified further "up the stack" in a way that precludes installation from sdists (e.g. a py.plugin
marker analogous to py.typed
, or similar?).
TL;DR: As currently specified, I don't believe this plugin architecture represents a change to the default security posture in Python packaging, but only because it's already non-ideal :slightly_smiling_face:. So it doesn't make the problem any worse, but it doesn't make it better either.
I might not be understanding, but I don't follow why this would be the case, for two reasons
One of us is misunderstanding, but it might be me.
What I think you're saying is that as a plugin author, I register an entry point that links to my module. Now, pip needs to call (at some point) my_module.plugin_type
, and it will get a return value. OK, but you've not said when pip will call that entry point - will it be before argument processing, or after, or at some unspecified point? And once pip calls that entry point and gets the plugin type, then what? There's no description of what else pip must do, except in your "dist-inspector" example, but as you said, that's an example, not part of the spec. So there's no requirement for pip to call the pre_download
and pre_extract
callables. At some point, the specification for when those callables get called needs to be written. So either the plugin feature does nothing by default, and we get a series of follow-up requests for individual plugin specifications, or a reasonable set of valid plugin types need to be defined in the initial plugin spec. What I'm saying is that I'm not happy agreeing to the former - I don't want to add a plugin capability without any idea what people are going to ask us to expose with it.
As you note, these would be a public API. As such, they would be subject to whatever stability model you'd like (as the maintainers).
Well, isn't that the point? We're on record as stating, on numerous occasions, that we are not willing to support any public API. So unless I'm misunderstanding you, you're asking us to alter that stance and to support this public API, at least.
Sorry if my example didn't make this clear: my thinking for these hooks is that their execution contract is "pure" -- they have no side effects from pip's perspective, and their only way to signal anything to pip itself is to raise a ValueError.
OK. No, that wasn't clear. Add to that the fact that plugin hooks need to adhere to the standard rule that they are not allowed to import any of pip's functionality, or use any pip internal APIs, and I guess they are basically just a notification API from pip to the plugin. Still an API (see my comment above), but certainly an extremely constrained one.
I'm concerned that this would be the "thin end of the wedge", though, and we'd get requests for allowing additional interaction as the limitations of the pure notification interface become clearer.
I think this is a good example of what a pip plugin should never do, i.e. should be explicitly excluded by the API contract π
I bet I can find reasons why plugins shouldn't do pretty much anything useful that you propose π I'm not being facetious by saying this, I'm trying to point out that without any real motivating examples of plugin types or uses, it's impossible to pin down what plugins are allowed to do (beyond "nothing, just to be safe"). Again, this is part of the "we don't provide an API" issue - we don't offer any guarantees that the global Python interpreter is in any sort of usable state for code injected into the pip process. It probably is (that's what I meant when I said "the sky hasn't fallen") but there can be problems (we've had bug reports from people using pip in-process who have found the logging system isn't in the right state for them, for example).
But someone who violates the plugin contract in this way is categorically indistinguishable from people who already violate pip's contracts around not having public APIs.)
While this is true in principle, it's much harder to write a tight plugin contract that excludes behaviours that we don't want to allow than it is to make a blanket statement that we don't support importing pip into your own process. And that's the fundamental issue here - I don't trust our ability to keep ourselves out of trouble once we loosen the current rules.
This isn't to say I'm totally against this idea. But my instincts are to retain our current stance, and simply declare that while we have plugins, all uses of plugins are unsupported and may break at any time, without warning. Essentially, that's the same footing that projects like pip-tools have to live with right now, and if plugin authors don't like that, then so be it.
Proposing a new plugin type must involve multiple, extant use cases.
That's not really the sticking point here (well, it is a sticking point, but it's not the most important one). The important issue is that we don't have maintainer time[^1] to review multiple plugin proposals. So there's a good chance that new plugin types can expect delays of months, or quite possibly years, before getting approved. There's much more important pip features that have been stalled for that sort of time period. So the additional constraint is that anyone proposing a new plugin type must be prepared to stick with the proposal for that sort of timescale.
installing a package via pip install (or even downloading with pip download) is by default equivalent to executing it, since users don't generally restrict themselves to wheel-only installs.
That's the sort of implied constraint that concerns me. There is ongoing work to try to switch pip to wheel-only downloads by default. One of the benefits of that proposal is that it improves security by removing the risk of running arbitrary code at install time. If the plugin proposal re-opens that risk then our arguments for making the change to wheel-only get undermined.
In addition, people who currently choose wheel-only installs because they are inherently more secure, will now be exposed to new risks that they might not be in a position to mitigate. So we should have a transition plan - we could require an opt-in flag to install wheels that include a pip plugin hook, for example. But this further complicates the whole proposal, in terms of both implementation and UI.
TL;DR: As currently specified, I don't believe this plugin architecture represents a change to the default security posture in Python packaging, but only because it's already non-ideal π. So it doesn't make the problem any worse, but it doesn't make it better either.
All the above being said, I do think you're broadly right here. The Python packaging ecosystem is not in a particularly good state right now as far as tight, auditable security is concerned. But does that mean we're OK with adding more features that we might not accept if we did have a strong security position?
Sorry - another long message. And I don't think I'm saying much that I haven't already said in one form or another. Basically, I see the value of the feature, but I'm not sure I'm willing to accept the cost of supporting the feature.
[^1]: And possibly not maintainer interest, either. I'm the only maintainer who's commented on this proposal so far, and I certainly wouldn't be putting this level of effort into every plugin interface that gets proposed, if we go down this route.
motivating examples of plugin types or uses
Our (Datadog) primary interest would be a plugin that verifies downloads using the attestations provided by PEP 740. This cannot be done by pip alone because only pure-Python packages may be vendored and cryptography
would be a requirement.
What I think you're saying is that as a plugin author, I register an entry point that links to my module. Now, pip needs to call (at some point) my_module.plugin_type, and it will get a return value. OK, but you've not said when pip will call that entry point - will it be before argument processing, or after, or at some unspecified point? And once pip calls that entry point and gets the plugin type, then what? There's no description of what else pip must do, except in your "dist-inspector" example, but as you said, that's an example, not part of the spec.
I might have caused some confusion with the "spec vs. not spec" distinction, sorry!
I didn't mean to imply that plugin type like dist-inspector
are not in the specification, only that I didn't include them in the initial RFC comment to keep it brief. What I had in mind for the specification was:
pip
, i.e. a precise definition of exactly when pip
loads them, calls them, etc.pip
accepts new plugin types, per above.So, dist-inspector
would be a part of the spec; I intended it to be an example of (2).
Well, isn't that the point? We're on record as stating, on numerous occasions, that we are not willing to support any public API. So unless I'm misunderstanding you, you're asking us to alter that stance and to support this public API, at least.
Yeah. This is perhaps too fine of a hair to split -- what I was trying to say there was that it would be a public API being committed to, but not one that's "uniquely" stable. In other words you could deprecate/remove plugin APIs per your current deprecation policy, much like pip
's current CLI undergoes changes.
But in retrospect this is an obvious thing to say, and doesn't change the story for you at all (since it's still a public API). So I think this point is moot π
Add to that the fact that plugin hooks need to adhere to the standard rule that they are not allowed to import any of pip's functionality, or use any pip internal APIs, and I guess they are basically just a notification API from pip to the plugin. Still an API (see my comment above), but certainly an extremely constrained one. I'm concerned that this would be the "thin end of the wedge", though, and we'd get requests for allowing additional interaction as the limitations of the pure notification interface become clearer.
Yeah, this is how I'm conceiving the API here -- I think only being able to notify pip
and not directly (within contract) modify pip
's state is the most tractable design here, both in terms of minimizing any public API commitments and not interfering with pip
's internal architectural changes.
I unfortunately agree with your concern, though: I think people will probably ask for all kinds of inadvisable things, and attempt to use the minimal interface propose here as a lever. But I also think that people can be politely (but firmly) redirected to docs/guidelines that explain why pip
can't and won't provide a more invasive plugin API.
This isn't to say I'm totally against this idea. But my instincts are to retain our current stance, and simply declare that while we have plugins, all uses of plugins are unsupported and may break at any time, without warning. Essentially, that's the same footing that projects like pip-tools have to live with right now, and if plugin authors don't like that, then so be it.
It'd be interesting to hear from @ofek and @sethmlarson, but for my part: I'm personally okay with this!
IMO this is a suitable footing, so long as plugin authors perform nightly and beta testing against pip
(and I certainly will be). AFAICT this allows the best of both worlds: pip
's plugin API will be 100% unstable and not subject to any guarantees, but plugin authors will at least have an oracle against which they can observe breakage, and will assume all responsibility for keeping things not-broken.
So the additional constraint is that anyone proposing a new plugin type must be prepared to stick with the proposal for that sort of timescale.
This seems like an exceedingly fair constraint to me π
The Python packaging ecosystem is not in a particularly good state right now as far as tight, auditable security is concerned. But does that mean we're OK with adding more features that we might not accept if we did have a strong security position?
Fair point (along with your points above about this potentially undermining a move to a secure wheel-only default).
Per your points about not making any stable API promises: maybe a mandatory --enable-plugins
is a workable solution? All bets would still be off with a malcious sdist, but that at minimum would prevent surprise loads of plugins.
From there, there could be a longer term pivot towards a special marker for pip
plugins + preventing installation of plugins from anything except wheel distributions (or local editables, I suppose). And there would be no compatibility challenge there, since compatibility is not guaranteed π
Sorry - another long message. And I don't think I'm saying much that I haven't already said in one form or another. Basically, I see the value of the feature, but I'm not sure I'm willing to accept the cost of supporting the feature.
I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip
maintainers think).
If you think this is still too onerous in the current state of affairs, I'd like to propose just the pip ext
part for now -- I think that part requires (almost) no lifecycle or execution contract considerations, since the handoff from pip
to the external subcommand is entirely one-way.
P.S.: No problem with the long messages! I'm also guilty of them, and I appreciate the effort you've put into reviewing this ideas and helping me clarify them so far.
P.P.S: Sorry for the delay -- I thought I responded with this yesterday but found this tab unsent this morning.
I'll keep it short, just for variety π
I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip maintainers think).
If plugins are explicitly unsupported, I'd view them as essentially the same as build backends. We'd still get users raising issues, but "speak to the plugin project" would be our answer. As with build backends, plugin authors would get no formal support[^1].
For me, that's acceptable. But the other maintainers may be more cautious than me.
Also, note that if a PR to add plugin support is large and/or complex, getting someone to review and merge it might be a problem independently of any approval in principle of the idea.
PS When I say "unsupported" that includes
"Pip downloaded
foo.whl
, but never called mypre_download
hook" "Sorry, plugin hooks are unsupported, we just changed our download mechanism, that's probably what happened".
Because guaranteed deprecation processes come under "support".
[^1]: It's always possible to nerd-snipe us, of course π
If plugins are explicitly unsupported, I'd view them as essentially the same as build backends. We'd still get users raising issues, but "speak to the plugin project" would be our answer. As with build backends, plugin authors would get no formal support1.
Makes sense!
I'll await other maintainer opinions here as well π.
Also, note that if a PR to add plugin support is large and/or complex, getting someone to review and merge it might be a problem independently of any approval in principle of the idea.
Understood -- I think this work should be decomposable into PRs of no more than 2-300 lines each, which is hopefully not too big for independent reviews. But this is something I'll keep an eye on, and include as a design factor.
PS When I say "unsupported" that includes
"Pip downloaded
foo.whl
, but never called mypre_download
hook"
That's quite unsupported π
That's quite unsupported π
Indeed π€£
In reality this may never happen, and we wouldn't deliberately do it, but I'm thinking very specifically of things like refactoring the internals to do things like parallel downloads, or partial downloads, or weird caching tricks. We could end up in a situation where we do a download far down a code path that doesn't have access to the active plugin list. Or we could fire a bunch of subprocesses to do downloads, which wouldn't be able to access plugins in the parent process. In any of these cases, I'd want to reserve the right to make the improvement and not worry about plugins (which I assume are going to be a niche part of our overall user base).
And you should remember that the follow up to the conversation would be "you're welcome to submit a PR to fix this" - which actually isn't that different from the response you'd get if the plugin mechanism were supported π
I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip maintainers think).
The use-case that I'd like to support would be okay with this outcome as well, being a plugin means you'd need to be integrating and testing against pip aggressively so I expect any breakages that do arise could be handled. Thanks for your consideration on this, @pfmoore!
Following up here: my colleague @facutuesca has been working on the architectural side of this (pip ext
+ basic entrypoint detection) and we should have something ready for sharing shortly! Once we do, we'll open it up as a draft PR here for the pip
maintainers to consider.
(As discussed above, we understand that it'll be important to emphasize the lack of stability guarantees around any changes that do get approved here. I'd love to have more discussions about how we can communicate that + contribute any and all docs necessary to keep users from expecting/burdening pip
here.)
Please note that as I've said previously, I'm a strong -1 on "basic entrypoint detection" in the absence of specific, documented entry point type definitions. It will be a waste of time to submit a proposal for entry point detection without any explicit API contracts, because there's nothing useful to debate/agree/reject.
As regards pip ext
, that's separate (and I'd prefer it if it were a separate PR for that reason). But regarding the approach you suggest above, one thought I had is that a long time ago @dstufft proposed a git-like extension mechanism, where custom subcommands mapped to separate executables. As a very rough design, pip ext foo
would execute a command pip-foo
in a subprocess. Arguments to the subcommand would be passed as arguments to the subprocess. The subprocess would have no access to pip's state or internals[^1]. If you wanted to be more secure than a raw path search, we could have a pip config entry that specified a set of paths where subcommands must be located - but I don't know if that's worth it (git doesn't seem to do this). This would be a lot simpler and less controversial than an entrypoint/in-process approach. If you're not happy with it, can you articulate why? It may be that in doing so, you expose some hidden assumptions that you weren't aware of[^2].
[^1]: As a future extension, we could consider exposing selected state via environment variables set in the subprocess environment. But that's not part of the basic idea. [^2]: For example, I often get the impression that when we suggest writing a standalone utility for some feature request, people don't like that idea because they hope that writing it within pip means that they can use pip's internal functions - so they will see "write an extension command" as license to do just that, no matter how much we tell them it's not allowed...
As a very rough design,
pip ext foo
would execute a commandpip-foo
in a subprocess.
I just did that at work actually (with help of a library I had to write for it), but arbitrarily extensible rather than under a command group:
specific, documented entry point type definitions.
How would you like these documented? Based on the conversation upthread I thought there was rough consensus on an (explicitly unstable) entrypoint for "dist-inspector", i.e. an interface capable of inspecting download and extraction states without actually being able to mutate them.
I put a rough sketch of that idea in https://github.com/pypa/pip/issues/12766#issuecomment-2168848039, but I'm happy to create a break-out issue for it if you'd prefer. But it'd also be good to know the "directionality" of the review process here, i.e. whether you and the other pip
maintainers would prefer we mock something up as a draft PR first or instead flesh it out in issues and textual documentation first.
This would be a lot simpler and less controversial than an entrypoint/in-process approach. If you're not happy with it, can you articulate why? It may be that in doing so, you expose some hidden assumptions that you weren't aware of
It's not so much unhappiness as that I think the two encompass distinct, but equally valuable, use cases. I think some of this was already articulated upthread, but to coalesce it:
pip
tooling ecosystem. Examples of non-pip
tools that benefit from this kind of integration are pip-compile
and pip-audit
.pip ext verify
or similar; the fatigue-minimizing thing is to have verification run inline with the operation being verified (i.e. package downloads).In particular I think there's a strong value case for (2) even without extensive access to pip
's internals -- the dist-inspect
idea proposed above would enable things like signature verification, SBOM generation, etc. while only requiring pip
to pass a single URL or path to the plugin.
How would you like these documented? Based on the conversation upthread I thought there was rough consensus on an (explicitly unstable) entrypoint for "dist-inspector", i.e. an interface capable of inspecting download and extraction states without actually being able to mutate them.
I'm saying that I'd want a PR adding an actual plugin, not just one that adds the infrastructure. I don't care whether you do a PR for the infrastructure and a second PR for the dist-inspector plugin type, or put both in the same PR, but I don't want to do anything until both parts exist. I want to see how that plugin would be documented, and what the impact is on pip's codebase. I'd like to see the tests that would be added, as they are, in a fundamental sense, the minimum guarantees that we'll provide. I don't want to reason about this in the abstract, I want to see how it would work in practice, with a non-artificial example.
Specifically, I'm not comfortable just adding an "architecture". This has to satisfy an actual, real-world, use case. And it can only do that if we add a plugin type that provides some genuine benefit at the same time as we add the architecture.
It's not so much unhappiness as that I think the two encompass distinct, but equally valuable, use cases.
Sorry, I wasn't sufficiently clear. What I was talking about when I said "this would be a lot simpler" was specifically about subcommands, and not about entrypoints/plugins. And in particular, I was pointing out that we don't need an entry point mechanism to support subcommands. If we want to allow users to add custom subcommands, we can do this by simply saying that pip ext foo
runs a command pip-foo
found on $PATH
. Done. No complex archiecture, no access to internal state, no temptation to import pip, you literally just write your utility however you want (it doesn't even need to be in Python!) and it's available as pip ext foo
.
What I was asking was for you to articulate why you feel that you need more than this (if, in fact, you do). I know that people have a reluctance to write standalone utilities, but I've never got anyone to say why, and I'm always left with a feeling that the answer is something like "so that I can use pip internals", or "so that the pip maintainers will look after the code for me". The above "run a subprocess" API strips away all of those benefits (that we don't want to allow anyway), and leaves us with the pure question - is it only for consistency of naming? And if not, what is the reason?
Specifically, I'm not comfortable just adding an "architecture". This has to satisfy an actual, real-world, use case. And it can only do that if we add a plugin type that provides some genuine benefit at the same time as we add the architecture.
Understood, thank you for elaborating! I suspect the simplest thing for us to do is start with one big PR with the full "big picture," and then break it down as necessary once it passes muster.
What I was asking was for you to articulate why you feel that you need more than this (if, in fact, you do).
Thank you for clarifying here, this was my misunderstanding! I am in 100% agreement that doing extensions via $PATH
lookup is better, simpler, and consistent with how just about everything outside of Python does CLI extensions π.
I created a draft PR for the implementation here: https://github.com/pypa/pip/pull/12985
(note that it only covers the in-process plugins loaded by entrypoint, not the external pip ext
commands)
On the pip ext
front, it would inevitably be confusing if python -m pip ext compile
ran pip-compile
from the system PATH rather than looking up the pip-compile
entry point in the same environment as pip
and running it the same way a wrapper script would.
Checking the system executable path would be a good supplement to support non-Python extensions, but entry points should have priority for Python tools.
(the "inline activity monitor" proposal and the "external command" proposal feel like they should be separate issues, though)
What's the problem this feature will solve?
Hello,
pip
maintainers!This is (another) proposal for a plugin/extension system for
pip
. My goals with it are twofold:pip
's API internalspip ext CMD
hierarchy, allowing existingpip-
tooling (including tools that can't easily be integrated intopip
itself or shouldn't be) to provide a better and more consistent UX.TL;DR: a minimal plugin architecture for
pip
would allow for better integrations with external tooling, including codebases (e.g. cryptographic codebases with native components) that cannot be easily or desirably vendored intopip
itself.Describe the solution you'd like
I have two things in mind:
pip
, allowing third-party packages to register plugins.pip ext ...
subcommand hierarchy, populated by plugins that register the appropriate entry point, allowing third-party packages to register wholly independent subcommands.I think both of these would be nice to have, but I think either also makes a good proposal. So I'm curious to hear what others think!
Plugin architecture
My high level idea:
pip
gains awareness of thepip.plugins
entry point group.For example, a plugin might register as:
...where
plugin
is a module object with the following minimal interface:and
PluginType
is:from here, the remaining attributes of the
plugin
module are determined byPluginType
; the intended contract betweenpip
and the plugin is thatpip
will ignore (and warn on?) any plugin of a type it does not recognize.(I have ideas for an initial trial-run
PluginType
, but I want to make sure this basic approach/architecture is amenable before I get into the details there!)pip ext
commandspip ext
subcommands would be a specialization of the above architecture. For example, to registerpip ext frobulate
, a third-party package might register the following:From here, the
cli
attribute is expected to be a module with the following attributes:...where
args
is the list of arguments passed afterpip ext frobulate
.Under this model, subcommands under
pip ext
are entirely responsible for their own lifecycle:pip
provides no public APIs, no additional context besidesargs
(andos.environ
), and the subcommand is expected to handle its own errors.The
description
callable is used solely to populatepip ext --list
, e.g. to an effect like this (probably more nicely rendered):Timeline
Either (or both) of these would be a significant feature addition to
pip
. As such, my thinking is that they should go throughpip --use-feature
like other experimental features, e.g.:From there, plugin/
pip ext
developers could experiment with either feature before they're fully stabilized, withoutpip
committing to an exact API/interface until stabilization.Alternative Solutions
The minimal alternative here is "do nothing." π
However, for each of the above:
pip
plugins: expect people to wrappip
instead via its public CLI (or a wrapper likepip-api
. Where this isn't sufficiently introspective, users/communities could build their own one-off tools. This is more or less the status quo, and results in a lot of duplication/tools that buggily wrappip
(like some of my tools).pip ext
subcommands: Continue the status quo of people (informally) signaling the adjacency of their tool topip
viapip-
, e.g.pip-compile
,pip-tools
,pip-audit
, etc. This is workable, although it's not the nicest UX compared to a unified subcommand CLI. Moreover, it can result in weird mismatches (e.g. wherepip
uses one Python/environment andpip-compile
uses another).Additional context
A lot of ink has been spilled over plugin architectures before: #3999 and #3121 are probably the oldest and most immediately relevant, but there are references to user requests for various plugin/API architectures scattered throughout the issues. I can try to collate all of them, if desired π
After discussion, if some variant of this proposal is amenable, I (and my colleagues) will happily implement it and provide ongoing maintenance for it (like we do for PyPI, twine, gh-action-pypi-publish, etc.) -- our objective is not to drop a pile of new code on
pip
and run away, but to work closely with you all and make sure that anything we propose strikes the right balance between value provided to end users, potential new error modes, and your limited maintenance time.Code of Conduct