pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.55k stars 3.04k forks source link

[RFC] A plugin/extension architecture for `pip` #12766

Open woodruffw opened 5 months ago

woodruffw commented 5 months ago

What's the problem this feature will solve?

Hello, pip maintainers!

This is (another) proposal for a plugin/extension system for pip. My goals with it are twofold:

  1. Define a minimal plugin API that allows plugins, but keeps them at an arm's length from pip's API internals
  2. Define a subcommand API that allows commands under the pip ext CMD hierarchy, allowing existing pip- tooling (including tools that can't easily be integrated into pip itself or shouldn't be) to provide a better and more consistent UX.

TL;DR: a minimal plugin architecture for pip would allow for better integrations with external tooling, including codebases (e.g. cryptographic codebases with native components) that cannot be easily or desirably vendored into pip itself.

Describe the solution you'd like

I have two things in mind:

  1. A limited entry points-based plugin architecture for pip, allowing third-party packages to register plugins.
  2. A pip ext ... subcommand hierarchy, populated by plugins that register the appropriate entry point, allowing third-party packages to register wholly independent subcommands.

I think both of these would be nice to have, but I think either also makes a good proposal. So I'm curious to hear what others think!

Plugin architecture

My high level idea: pip gains awareness of the pip.plugins entry point group.

For example, a plugin might register as:

[project.entry-points."pip.plugins"]
quux = "sampleproject.plugin"

...where plugin is a module object with the following minimal interface:

def plugin_type() -> PluginType:
    ...

and PluginType is:

PluginType = Literal["some"] | Literal["another"]

from here, the remaining attributes of the plugin module are determined by PluginType; the intended contract between pip and the plugin is that pip will ignore (and warn on?) any plugin of a type it does not recognize.

(I have ideas for an initial trial-run PluginType, but I want to make sure this basic approach/architecture is amenable before I get into the details there!)

pip ext commands

pip ext subcommands would be a specialization of the above architecture. For example, to register pip ext frobulate, a third-party package might register the following:

[project.entry-points."pip.plugins.ext"]
frobulate = "sampleproject.cli"

From here, the cli attribute is expected to be a module with the following attributes:

def description() -> str:
    return "a brief oneline description of the command"

def main(args: list[str]) -> None:
    ...

...where args is the list of arguments passed after pip ext frobulate.

Under this model, subcommands under pip ext are entirely responsible for their own lifecycle: pip provides no public APIs, no additional context besides args (and os.environ), and the subcommand is expected to handle its own errors.

The description callable is used solely to populate pip ext --list, e.g. to an effect like this (probably more nicely rendered):

$ pip ext --list
plugin           description
frobulate        a brief oneline description of the command
wangjangle       randomly install a python package
compile          run pip-compile

Timeline

Either (or both) of these would be a significant feature addition to pip. As such, my thinking is that they should go through pip --use-feature like other experimental features, e.g.:

pip --use-feature=plugins
pip --use-feature=extensions

# or combined, no distinction?
pip --use-feature=plugins

From there, plugin/pip ext developers could experiment with either feature before they're fully stabilized, without pip committing to an exact API/interface until stabilization.

Alternative Solutions

The minimal alternative here is "do nothing." πŸ™‚

However, for each of the above:

  1. pip plugins: expect people to wrap pip instead via its public CLI (or a wrapper like pip-api. Where this isn't sufficiently introspective, users/communities could build their own one-off tools. This is more or less the status quo, and results in a lot of duplication/tools that buggily wrap pip (like some of my tools).

  2. pip ext subcommands: Continue the status quo of people (informally) signaling the adjacency of their tool to pip via pip-, e.g. pip-compile, pip-tools, pip-audit, etc. This is workable, although it's not the nicest UX compared to a unified subcommand CLI. Moreover, it can result in weird mismatches (e.g. where pip uses one Python/environment and pip-compile uses another).

Additional context

A lot of ink has been spilled over plugin architectures before: #3999 and #3121 are probably the oldest and most immediately relevant, but there are references to user requests for various plugin/API architectures scattered throughout the issues. I can try to collate all of them, if desired πŸ™‚

After discussion, if some variant of this proposal is amenable, I (and my colleagues) will happily implement it and provide ongoing maintenance for it (like we do for PyPI, twine, gh-action-pypi-publish, etc.) -- our objective is not to drop a pile of new code on pip and run away, but to work closely with you all and make sure that anything we propose strikes the right balance between value provided to end users, potential new error modes, and your limited maintenance time.

Code of Conduct

sethmlarson commented 5 months ago

Thanks for putting together this proposal @woodruffw! Wanted to add some more evidence that such a plugin system would be used. A while ago I created a tool for generating Software Bill-of-Materials documents from Python environments/requirements files using existing (and proposed) PEP standards. Since Python packages + SBOMs is one area I'm focusing on in this upcoming year it would be great to provide a more native-feeling experience as a pip extension :)

pfmoore commented 5 months ago

I remain sympathetic in principle to having a plugin API for pip. And I think we should acknowledge the reality that even though we have consistently stated that people must not rely on pip's internal API, nevertheless people do, and the sky hasn't (yet) fallen. But that doesn't mean we're now going to support using pip as a library, or be comfortable adding features that suggest we will.

The proposal for pip ext commands seems relatively harmless, insofar as it doesn't offer anything that isn't just as possible using a standalone Python script. I'm not really sure I see the attraction of injecting such a script into the pip command namespace, particularly if it has to be in the form pip ext name (which is frankly a little verbose), rather than just running it directly - but if people want to do so, I guess I don't see the problem.

The main plugin proposal feels a bit light on detail. Defining an entry point namespace is fine, but unless there's some contract that defines at least how and when pip calls registered plugins, it doesn't say anything useful. And any such contract is, at some level, a guaranteed API that pip provides. So whether the plugin proposal is acceptable depends entirely on what that contract is. I'm willing to be open-minded about the possibility of being able to define something acceptable, but we've been around this loop many times, so I want to know what's different about this proposal.

woodruffw commented 5 months ago

The proposal for pip ext commands seems relatively harmless, insofar as it doesn't offer anything that isn't just as possible using a standalone Python script. I'm not really sure I see the attraction of injecting such a script into the pip command namespace, particularly if it has to be in the form pip ext name (which is frankly a little verbose), rather than just running it directly - but if people want to do so, I guess I don't see the problem.

I think the strongest arguments here are consistency and discovery: pip ext <...> gives the surrounding tooling ecosystem a way to minimize the number of things they throw on the user's $PATH, and automatically facilitates discovery (if pip ext --list or similar is deemed acceptable).

In terms of verbosity: my original thought was to propose it without the ext interstitial, i.e. someone could register pip frobulate directly. That could be implemented such that it would never shadow any built-in pip subcommand (present or future), but it's harder to teach users when such shadowing occurs. So I erred towards pip ext as a slightly longer but always unambiguous option πŸ™‚

The main plugin proposal feels a bit light on detail. Defining an entry point namespace is fine, but unless there's some contract that defines at least how and when pip calls registered plugins, it doesn't say anything useful. And any such contract is, at some level, a guaranteed API that pip provides.

This sounds like the right opportunity for me to go into detail, then!

To make things more concrete, here's a kind of plugin that I could imagine: something that allows distributions to be introspected at various points before they're unarchived. Here's an example signature set for that plugin:

def plugin_type() -> PluginType:
    # name subject to discussion!
    return "dist-inspector"

def pre_download(url: str) -> None:
    # contract: `pre_download` raises `ValueError` to terminate
    # the operation that intends to download `url`
    pass

def pre_extract(dist: Path) -> None:
    # contract: `pre_extract` raises `ValueError` to terminate
    # the operation that intends to unarchive `dist`
    pass

when registered, pip should enable this plugin on operations that download distributions, e.g. pip download and pip install.

This plugin is a little bit contrived, but demonstrates that useful things can be achieved while not committing pip to much besides the API of the plugin itself:

(There are pieces of this that would need to be hammered out, e.g. how caching is handled, and whether every download is trapped by the plugin, or only ones that follow completed candidate selection. But I think it demonstrates a workable approach that can be used to conservatively carve out very small stable API commitments.)

pfmoore commented 5 months ago

To make things more concrete, here's a kind of plugin that I could imagine

Right. But that's just one example. And every example comes with a requirement for pip to add the infrastructure to call the plugin at the right time(s). So the proposal needs to define all the allowed plugin types. And that's basically a programmatic API. For example, does the pre_download hook have to be thread safe? If the answer is "yes", then plugins need to be more complex. If the answer is "no", then pip can't parallelise downloads without adding extra support for serialising plugin calls. Either way, there's a bigger contract than just "pip calls the plugin".

But I think it demonstrates a workable approach that can be used to conservatively carve out very small stable API commitments.

Yes, this is the problem. In effect, the plugin API becomes a "stealth" attempt to get pip to commit to API stability guarantees. And that's what we've been against doing from the start. There are a bunch of reasons for this:

  1. It's extra work that we don't have time to do.
  2. It's extra support that we don't have time to offer.
  3. It restricts our freedom to evolve pip's implementation.

Remember, pip is an application, not a library - and that's a choice, not an accident. Even applications which have a plugin interface don't let plugins dig around in the application internals - instead, they provide a carefully designed and controlled API that plugins can use. And pip doesn't have such an API.

Let's give a really simple example. How can a plugin print a message to the user? It can't use raw print calls or access sys.stdout, because that will disrupt pip's output (progress bars, logging, handling of verbosity levels, etc). So what do we do? Provide a print function that plugins can use safely? That function will need writing, testing, documenting, and supporting.

Yes, there are pip internal routines for all of this. And yes, as I said above, people are using pip's internals and the sky hasn't fallen. But they know we don't support what they are doing, and they accept that. Once plugin support is added to pip, how do we ensure that message stays clear? Because we can't simply close every bug that says "I have a plugin..." saying we don't support plugins (which is what we do now when people say "I am importing pip...").

So I think this leaves us back at my previous comment - for this proposal to be considered, you'll need to specify what plugin types you're suggesting, and what the contract of each one is. And given that there are bound to be requests for additional hooks, how do we limit the scope up front so that maintainers aren't faced with an ongoing job of rejecting proposals for "just one more small hook"?

Also, and I'm somewhat surprised it's me having to ask this, what are the security implications here? Installing a wheel is currently (relatively) safe, because it doesn't run arbitrary code from the internet. But I'm pretty sure that if we had plugins I could write a wheel that installed a pip plugin which (1) did malicious stuff whenever pip is run, and (2) hijacked pip uninstall so that uninstalling the problematic wheel didn't remove the plugin. Couple that with the fact that people still run sudo pip, in spite of all the warnings, and the hackers just won.

woodruffw commented 5 months ago

But that's just one example. And every example comes with a requirement for pip to add the infrastructure to call the plugin at the right time(s). So the proposal needs to define all the allowed plugin types.

I might not be understanding, but I don't follow why this would be the case, for two reasons:

  1. As you note, these would be a public API. As such, they would be subject to whatever stability model you'd like (as the maintainers). For example, a future pip v123 could say that plugin types dist-inspector and frobulator are stabilized, and plugin authors for those types could set pip >= 123 as a constraint in their plugin's metadata. Similarly, for deprecations, pip could use its current 6 month deprecation policy to issue a warning on deprecated plugin types, and then produce a hard error (or silently ignore) those types after deprecation is completed.
  2. Beyond (1), I've proposed that all of this be tucked behind a --use-feature flag as part of an initial phase. IIUC this means that it wouldn't impact ordinary users or impose any sort of permanent API commitment on pip's part, at least until it leaves the experimental feature phase. And at that point (1) kicks in.

For example, does the pre_download hook have to be thread safe? If the answer is "yes", then plugins need to be more complex. If the answer is "no", then pip can't parallelise downloads without adding extra support for serialising plugin calls. Either way, there's a bigger contract than just "pip calls the plugin".

Sorry if my example didn't make this clear: my thinking for these hooks is that their execution contract is "pure" -- they have no side effects from pip's perspective, and their only way to signal anything to pip itself is to raise a ValueError. This should be suitable for any future multi-threading refactor, and keeps the contract to a bare minimum (per your larger concerns about what pip commits to).

(This brings up a good design point however, which is that loading these plugins probably shouldn't be unidirectional, i.e. shouldn't be triggered just by pip install some-plugin also installing an entry point, so that any plugin that does violate this contract can't do so by default, and can be immediately disabled if it misbehaves. That has salience as a security feature as well, so I'll elaborate below.)

Let's give a really simple example. How can a plugin print a message to the user? It can't use raw print calls or access sys.stdout, because that will disrupt pip's output (progress bars, logging, handling of verbosity levels, etc). So what do we do? Provide a print function that plugins can use safely? That function will need writing, testing, documenting, and supporting.

I think this is a good example of what a pip plugin should never do, i.e. should be explicitly excluded by the API contract :slightly_smiling_face:

(We're talking Python of course, so there's no formal enforcement of a plugin's contract. But someone who violates the plugin contract in this way is categorically indistinguishable from people who already violate pip's contracts around not having public APIs.)

And given that there are bound to be requests for additional hooks, how do we limit the scope up front so that maintainers aren't faced with an ongoing job of rejecting proposals for "just one more small hook"?

Thank you for calling this out. I would be happy to discuss this further as well, but per the points above about an execution contract, here are some fundamentals that I think would serve you as the pip maintainers well:

  1. Proposing a new plugin type must involve multiple, extant use cases. In practice this means that it's never sufficient to have just one tool that needs a custom plugin, or to have multiple nonexistent tools that "promise" to use the plugin once it exists. Using dist-inspector as an example: pip-compile, pip-audit, and @sethmlarson's pip-sbom are all extant independent tools that could make use of such a plugin.
  2. All plugin types must obey a fundamental execution contract: they must not have side effects that are visible to the pip process that executes them, except for explicit exceptional control flow that preserves pip's ability to thread or parallelize its internals.
  3. pip makes no guarantee about the "reachability" of plugins except where explicitly documented. Using dist-inspector as an example: pip might choose to only guarantee that uncached URLs retrieved during collection via pip install trigger the plugin's hooks.
  4. pip reserves the right to refuse to load plugins at all when it believes its being run from a non-recommended state, e.g. sudo pip or user/global sites when marked with PEP 668.

I would be happy to flesh these out more as well, and of course to contribute them to your developer and user docs as part of the implementation effort.

Also, and I'm somewhat surprised it's me having to ask this, what are the security implications here? Installing a wheel is currently (relatively) safe, because it doesn't run arbitrary code from the internet.

Thank you for calling this out as well! @sethmlarson and I discussed this a bit, and had a couple of thoughts on it:

  1. As currently specified, the threat model here is equivalent to the current threat model around installing Python packages: installing a package via pip install (or even downloading with pip download) is by default equivalent to executing it, since users don't generally restrict themselves to wheel-only installs. This isn't ideal, but it represents no worse of a security state than Python packaging is already in (especially since all bets are off once you execute any third-party code -- someone could just as easily rewrite or noop-out pip uninstall from setup.py or any other convenient vantage point).
  2. Still, (1) is not ideal. To do better than (1), we could require that pip plugins are loaded "bidirectionally": the user both has to do pip install sampleproject[plugin] and explicitly enable plugins. For example, even after the --use-feature phase, pip could have a --enable-plugins or similar flag (and conf setting) that's required to actually load the plugin.

I think (2) is probably preferable, at the cost of a slightly less smooth UX. But I'll note that the Python packaging model is fundamentally brittle from a security perspective, regardless of what we do here: the game is "over" as soon as an sdist is allowed to run arbitrary code, meaning that even bidirectional controls can be circumvented (e.g. by the sdist setting the config itself).

So this is really just card shuffling to some degree, unless plugins could be identified further "up the stack" in a way that precludes installation from sdists (e.g. a py.plugin marker analogous to py.typed, or similar?).

TL;DR: As currently specified, I don't believe this plugin architecture represents a change to the default security posture in Python packaging, but only because it's already non-ideal :slightly_smiling_face:. So it doesn't make the problem any worse, but it doesn't make it better either.

pfmoore commented 5 months ago

I might not be understanding, but I don't follow why this would be the case, for two reasons

One of us is misunderstanding, but it might be me.

What I think you're saying is that as a plugin author, I register an entry point that links to my module. Now, pip needs to call (at some point) my_module.plugin_type, and it will get a return value. OK, but you've not said when pip will call that entry point - will it be before argument processing, or after, or at some unspecified point? And once pip calls that entry point and gets the plugin type, then what? There's no description of what else pip must do, except in your "dist-inspector" example, but as you said, that's an example, not part of the spec. So there's no requirement for pip to call the pre_download and pre_extract callables. At some point, the specification for when those callables get called needs to be written. So either the plugin feature does nothing by default, and we get a series of follow-up requests for individual plugin specifications, or a reasonable set of valid plugin types need to be defined in the initial plugin spec. What I'm saying is that I'm not happy agreeing to the former - I don't want to add a plugin capability without any idea what people are going to ask us to expose with it.

As you note, these would be a public API. As such, they would be subject to whatever stability model you'd like (as the maintainers).

Well, isn't that the point? We're on record as stating, on numerous occasions, that we are not willing to support any public API. So unless I'm misunderstanding you, you're asking us to alter that stance and to support this public API, at least.

Sorry if my example didn't make this clear: my thinking for these hooks is that their execution contract is "pure" -- they have no side effects from pip's perspective, and their only way to signal anything to pip itself is to raise a ValueError.

OK. No, that wasn't clear. Add to that the fact that plugin hooks need to adhere to the standard rule that they are not allowed to import any of pip's functionality, or use any pip internal APIs, and I guess they are basically just a notification API from pip to the plugin. Still an API (see my comment above), but certainly an extremely constrained one.

I'm concerned that this would be the "thin end of the wedge", though, and we'd get requests for allowing additional interaction as the limitations of the pure notification interface become clearer.

I think this is a good example of what a pip plugin should never do, i.e. should be explicitly excluded by the API contract πŸ™‚

I bet I can find reasons why plugins shouldn't do pretty much anything useful that you propose πŸ˜‰ I'm not being facetious by saying this, I'm trying to point out that without any real motivating examples of plugin types or uses, it's impossible to pin down what plugins are allowed to do (beyond "nothing, just to be safe"). Again, this is part of the "we don't provide an API" issue - we don't offer any guarantees that the global Python interpreter is in any sort of usable state for code injected into the pip process. It probably is (that's what I meant when I said "the sky hasn't fallen") but there can be problems (we've had bug reports from people using pip in-process who have found the logging system isn't in the right state for them, for example).

But someone who violates the plugin contract in this way is categorically indistinguishable from people who already violate pip's contracts around not having public APIs.)

While this is true in principle, it's much harder to write a tight plugin contract that excludes behaviours that we don't want to allow than it is to make a blanket statement that we don't support importing pip into your own process. And that's the fundamental issue here - I don't trust our ability to keep ourselves out of trouble once we loosen the current rules.

This isn't to say I'm totally against this idea. But my instincts are to retain our current stance, and simply declare that while we have plugins, all uses of plugins are unsupported and may break at any time, without warning. Essentially, that's the same footing that projects like pip-tools have to live with right now, and if plugin authors don't like that, then so be it.

Proposing a new plugin type must involve multiple, extant use cases.

That's not really the sticking point here (well, it is a sticking point, but it's not the most important one). The important issue is that we don't have maintainer time[^1] to review multiple plugin proposals. So there's a good chance that new plugin types can expect delays of months, or quite possibly years, before getting approved. There's much more important pip features that have been stalled for that sort of time period. So the additional constraint is that anyone proposing a new plugin type must be prepared to stick with the proposal for that sort of timescale.

installing a package via pip install (or even downloading with pip download) is by default equivalent to executing it, since users don't generally restrict themselves to wheel-only installs.

That's the sort of implied constraint that concerns me. There is ongoing work to try to switch pip to wheel-only downloads by default. One of the benefits of that proposal is that it improves security by removing the risk of running arbitrary code at install time. If the plugin proposal re-opens that risk then our arguments for making the change to wheel-only get undermined.

In addition, people who currently choose wheel-only installs because they are inherently more secure, will now be exposed to new risks that they might not be in a position to mitigate. So we should have a transition plan - we could require an opt-in flag to install wheels that include a pip plugin hook, for example. But this further complicates the whole proposal, in terms of both implementation and UI.

TL;DR: As currently specified, I don't believe this plugin architecture represents a change to the default security posture in Python packaging, but only because it's already non-ideal πŸ™‚. So it doesn't make the problem any worse, but it doesn't make it better either.

All the above being said, I do think you're broadly right here. The Python packaging ecosystem is not in a particularly good state right now as far as tight, auditable security is concerned. But does that mean we're OK with adding more features that we might not accept if we did have a strong security position?

Sorry - another long message. And I don't think I'm saying much that I haven't already said in one form or another. Basically, I see the value of the feature, but I'm not sure I'm willing to accept the cost of supporting the feature.

[^1]: And possibly not maintainer interest, either. I'm the only maintainer who's commented on this proposal so far, and I certainly wouldn't be putting this level of effort into every plugin interface that gets proposed, if we go down this route.

ofek commented 5 months ago

motivating examples of plugin types or uses

Our (Datadog) primary interest would be a plugin that verifies downloads using the attestations provided by PEP 740. This cannot be done by pip alone because only pure-Python packages may be vendored and cryptography would be a requirement.

woodruffw commented 5 months ago

What I think you're saying is that as a plugin author, I register an entry point that links to my module. Now, pip needs to call (at some point) my_module.plugin_type, and it will get a return value. OK, but you've not said when pip will call that entry point - will it be before argument processing, or after, or at some unspecified point? And once pip calls that entry point and gets the plugin type, then what? There's no description of what else pip must do, except in your "dist-inspector" example, but as you said, that's an example, not part of the spec.

I might have caused some confusion with the "spec vs. not spec" distinction, sorry!

I didn't mean to imply that plugin type like dist-inspector are not in the specification, only that I didn't include them in the initial RFC comment to keep it brief. What I had in mind for the specification was:

  1. A rough "scaffolding" API per the original RFC comment;
  2. A concrete list of supported plugin types and their "lifecycles" within pip, i.e. a precise definition of exactly when pip loads them, calls them, etc.
  3. A set of guidelines for how pip accepts new plugin types, per above.

So, dist-inspector would be a part of the spec; I intended it to be an example of (2).

Well, isn't that the point? We're on record as stating, on numerous occasions, that we are not willing to support any public API. So unless I'm misunderstanding you, you're asking us to alter that stance and to support this public API, at least.

Yeah. This is perhaps too fine of a hair to split -- what I was trying to say there was that it would be a public API being committed to, but not one that's "uniquely" stable. In other words you could deprecate/remove plugin APIs per your current deprecation policy, much like pip's current CLI undergoes changes.

But in retrospect this is an obvious thing to say, and doesn't change the story for you at all (since it's still a public API). So I think this point is moot πŸ™‚

Add to that the fact that plugin hooks need to adhere to the standard rule that they are not allowed to import any of pip's functionality, or use any pip internal APIs, and I guess they are basically just a notification API from pip to the plugin. Still an API (see my comment above), but certainly an extremely constrained one. I'm concerned that this would be the "thin end of the wedge", though, and we'd get requests for allowing additional interaction as the limitations of the pure notification interface become clearer.

Yeah, this is how I'm conceiving the API here -- I think only being able to notify pip and not directly (within contract) modify pip's state is the most tractable design here, both in terms of minimizing any public API commitments and not interfering with pip's internal architectural changes.

I unfortunately agree with your concern, though: I think people will probably ask for all kinds of inadvisable things, and attempt to use the minimal interface propose here as a lever. But I also think that people can be politely (but firmly) redirected to docs/guidelines that explain why pip can't and won't provide a more invasive plugin API.

This isn't to say I'm totally against this idea. But my instincts are to retain our current stance, and simply declare that while we have plugins, all uses of plugins are unsupported and may break at any time, without warning. Essentially, that's the same footing that projects like pip-tools have to live with right now, and if plugin authors don't like that, then so be it.

It'd be interesting to hear from @ofek and @sethmlarson, but for my part: I'm personally okay with this!

IMO this is a suitable footing, so long as plugin authors perform nightly and beta testing against pip (and I certainly will be). AFAICT this allows the best of both worlds: pip's plugin API will be 100% unstable and not subject to any guarantees, but plugin authors will at least have an oracle against which they can observe breakage, and will assume all responsibility for keeping things not-broken.

So the additional constraint is that anyone proposing a new plugin type must be prepared to stick with the proposal for that sort of timescale.

This seems like an exceedingly fair constraint to me πŸ™‚

The Python packaging ecosystem is not in a particularly good state right now as far as tight, auditable security is concerned. But does that mean we're OK with adding more features that we might not accept if we did have a strong security position?

Fair point (along with your points above about this potentially undermining a move to a secure wheel-only default).

Per your points about not making any stable API promises: maybe a mandatory --enable-plugins is a workable solution? All bets would still be off with a malcious sdist, but that at minimum would prevent surprise loads of plugins.

From there, there could be a longer term pivot towards a special marker for pip plugins + preventing installation of plugins from anything except wheel distributions (or local editables, I suppose). And there would be no compatibility challenge there, since compatibility is not guaranteed πŸ™‚


Sorry - another long message. And I don't think I'm saying much that I haven't already said in one form or another. Basically, I see the value of the feature, but I'm not sure I'm willing to accept the cost of supporting the feature.

I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip maintainers think).

If you think this is still too onerous in the current state of affairs, I'd like to propose just the pip ext part for now -- I think that part requires (almost) no lifecycle or execution contract considerations, since the handoff from pip to the external subcommand is entirely one-way.

P.S.: No problem with the long messages! I'm also guilty of them, and I appreciate the effort you've put into reviewing this ideas and helping me clarify them so far.

P.P.S: Sorry for the delay -- I thought I responded with this yesterday but found this tab unsent this morning.

pfmoore commented 5 months ago

I'll keep it short, just for variety πŸ˜‰

I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip maintainers think).

If plugins are explicitly unsupported, I'd view them as essentially the same as build backends. We'd still get users raising issues, but "speak to the plugin project" would be our answer. As with build backends, plugin authors would get no formal support[^1].

For me, that's acceptable. But the other maintainers may be more cautious than me.

Also, note that if a PR to add plugin support is large and/or complex, getting someone to review and merge it might be a problem independently of any approval in principle of the idea.

PS When I say "unsupported" that includes

"Pip downloaded foo.whl, but never called my pre_download hook" "Sorry, plugin hooks are unsupported, we just changed our download mechanism, that's probably what happened".

Because guaranteed deprecation processes come under "support".

[^1]: It's always possible to nerd-snipe us, of course πŸ™‚

woodruffw commented 5 months ago

If plugins are explicitly unsupported, I'd view them as essentially the same as build backends. We'd still get users raising issues, but "speak to the plugin project" would be our answer. As with build backends, plugin authors would get no formal support1.

Makes sense!

I'll await other maintainer opinions here as well πŸ™‚.

Also, note that if a PR to add plugin support is large and/or complex, getting someone to review and merge it might be a problem independently of any approval in principle of the idea.

Understood -- I think this work should be decomposable into PRs of no more than 2-300 lines each, which is hopefully not too big for independent reviews. But this is something I'll keep an eye on, and include as a design factor.

ofek commented 5 months ago

PS When I say "unsupported" that includes

"Pip downloaded foo.whl, but never called my pre_download hook"

That's quite unsupported πŸ˜…

pfmoore commented 5 months ago

That's quite unsupported πŸ˜…

Indeed 🀣

In reality this may never happen, and we wouldn't deliberately do it, but I'm thinking very specifically of things like refactoring the internals to do things like parallel downloads, or partial downloads, or weird caching tricks. We could end up in a situation where we do a download far down a code path that doesn't have access to the active plugin list. Or we could fire a bunch of subprocesses to do downloads, which wouldn't be able to access plugins in the parent process. In any of these cases, I'd want to reserve the right to make the improvement and not worry about plugins (which I assume are going to be a niche part of our overall user base).

And you should remember that the follow up to the conversation would be "you're welcome to submit a PR to fix this" - which actually isn't that different from the response you'd get if the plugin mechanism were supported πŸ˜‰

sethmlarson commented 5 months ago

I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other pip maintainers think).

The use-case that I'd like to support would be okay with this outcome as well, being a plugin means you'd need to be integrating and testing against pip aggressively so I expect any breakages that do arise could be handled. Thanks for your consideration on this, @pfmoore!

woodruffw commented 2 months ago

Following up here: my colleague @facutuesca has been working on the architectural side of this (pip ext + basic entrypoint detection) and we should have something ready for sharing shortly! Once we do, we'll open it up as a draft PR here for the pip maintainers to consider.

(As discussed above, we understand that it'll be important to emphasize the lack of stability guarantees around any changes that do get approved here. I'd love to have more discussions about how we can communicate that + contribute any and all docs necessary to keep users from expecting/burdening pip here.)

pfmoore commented 2 months ago

Please note that as I've said previously, I'm a strong -1 on "basic entrypoint detection" in the absence of specific, documented entry point type definitions. It will be a waste of time to submit a proposal for entry point detection without any explicit API contracts, because there's nothing useful to debate/agree/reject.

As regards pip ext, that's separate (and I'd prefer it if it were a separate PR for that reason). But regarding the approach you suggest above, one thought I had is that a long time ago @dstufft proposed a git-like extension mechanism, where custom subcommands mapped to separate executables. As a very rough design, pip ext foo would execute a command pip-foo in a subprocess. Arguments to the subcommand would be passed as arguments to the subprocess. The subprocess would have no access to pip's state or internals[^1]. If you wanted to be more secure than a raw path search, we could have a pip config entry that specified a set of paths where subcommands must be located - but I don't know if that's worth it (git doesn't seem to do this). This would be a lot simpler and less controversial than an entrypoint/in-process approach. If you're not happy with it, can you articulate why? It may be that in doing so, you expose some hidden assumptions that you weren't aware of[^2].

[^1]: As a future extension, we could consider exposing selected state via environment variables set in the subprocess environment. But that's not part of the basic idea. [^2]: For example, I often get the impression that when we suggest writing a standalone utility for some feature request, people don't like that idea because they hope that writing it within pip means that they can use pip's internal functions - so they will see "write an extension command" as license to do just that, no matter how much we tell them it's not allowed...

ofek commented 2 months ago

As a very rough design, pip ext foo would execute a command pip-foo in a subprocess.

I just did that at work actually (with help of a library I had to write for it), but arbitrarily extensible rather than under a command group:

Screenshot 2024-09-22 184035

woodruffw commented 2 months ago

specific, documented entry point type definitions.

How would you like these documented? Based on the conversation upthread I thought there was rough consensus on an (explicitly unstable) entrypoint for "dist-inspector", i.e. an interface capable of inspecting download and extraction states without actually being able to mutate them.

I put a rough sketch of that idea in https://github.com/pypa/pip/issues/12766#issuecomment-2168848039, but I'm happy to create a break-out issue for it if you'd prefer. But it'd also be good to know the "directionality" of the review process here, i.e. whether you and the other pip maintainers would prefer we mock something up as a draft PR first or instead flesh it out in issues and textual documentation first.

This would be a lot simpler and less controversial than an entrypoint/in-process approach. If you're not happy with it, can you articulate why? It may be that in doing so, you expose some hidden assumptions that you weren't aware of

It's not so much unhappiness as that I think the two encompass distinct, but equally valuable, use cases. I think some of this was already articulated upthread, but to coalesce it:

  1. Subcommands are fantastic for providing a consistent user experience, i.e. reducing the number of context switches users have to perform between different tools in the pip tooling ecosystem. Examples of non-pip tools that benefit from this kind of integration are pip-compile and pip-audit.
  2. Entrypoints/plugins are ideal for providing a unified inline user experience, i.e. enabling tools to operate within ordinary packaging operation lifecycles. This is especially valuable for security-focused operations, where the user shouldn't have to remember pip ext verify or similar; the fatigue-minimizing thing is to have verification run inline with the operation being verified (i.e. package downloads).

In particular I think there's a strong value case for (2) even without extensive access to pip's internals -- the dist-inspect idea proposed above would enable things like signature verification, SBOM generation, etc. while only requiring pip to pass a single URL or path to the plugin.

pfmoore commented 2 months ago

How would you like these documented? Based on the conversation upthread I thought there was rough consensus on an (explicitly unstable) entrypoint for "dist-inspector", i.e. an interface capable of inspecting download and extraction states without actually being able to mutate them.

I'm saying that I'd want a PR adding an actual plugin, not just one that adds the infrastructure. I don't care whether you do a PR for the infrastructure and a second PR for the dist-inspector plugin type, or put both in the same PR, but I don't want to do anything until both parts exist. I want to see how that plugin would be documented, and what the impact is on pip's codebase. I'd like to see the tests that would be added, as they are, in a fundamental sense, the minimum guarantees that we'll provide. I don't want to reason about this in the abstract, I want to see how it would work in practice, with a non-artificial example.

Specifically, I'm not comfortable just adding an "architecture". This has to satisfy an actual, real-world, use case. And it can only do that if we add a plugin type that provides some genuine benefit at the same time as we add the architecture.

It's not so much unhappiness as that I think the two encompass distinct, but equally valuable, use cases.

Sorry, I wasn't sufficiently clear. What I was talking about when I said "this would be a lot simpler" was specifically about subcommands, and not about entrypoints/plugins. And in particular, I was pointing out that we don't need an entry point mechanism to support subcommands. If we want to allow users to add custom subcommands, we can do this by simply saying that pip ext foo runs a command pip-foo found on $PATH. Done. No complex archiecture, no access to internal state, no temptation to import pip, you literally just write your utility however you want (it doesn't even need to be in Python!) and it's available as pip ext foo.

What I was asking was for you to articulate why you feel that you need more than this (if, in fact, you do). I know that people have a reluctance to write standalone utilities, but I've never got anyone to say why, and I'm always left with a feeling that the answer is something like "so that I can use pip internals", or "so that the pip maintainers will look after the code for me". The above "run a subprocess" API strips away all of those benefits (that we don't want to allow anyway), and leaves us with the pure question - is it only for consistency of naming? And if not, what is the reason?

woodruffw commented 2 months ago

Specifically, I'm not comfortable just adding an "architecture". This has to satisfy an actual, real-world, use case. And it can only do that if we add a plugin type that provides some genuine benefit at the same time as we add the architecture.

Understood, thank you for elaborating! I suspect the simplest thing for us to do is start with one big PR with the full "big picture," and then break it down as necessary once it passes muster.

What I was asking was for you to articulate why you feel that you need more than this (if, in fact, you do).

Thank you for clarifying here, this was my misunderstanding! I am in 100% agreement that doing extensions via $PATH lookup is better, simpler, and consistent with how just about everything outside of Python does CLI extensions πŸ™‚.

facutuesca commented 1 month ago

I created a draft PR for the implementation here: https://github.com/pypa/pip/pull/12985

(note that it only covers the in-process plugins loaded by entrypoint, not the external pip ext commands)

ncoghlan commented 2 weeks ago

On the pip ext front, it would inevitably be confusing if python -m pip ext compile ran pip-compile from the system PATH rather than looking up the pip-compile entry point in the same environment as pip and running it the same way a wrapper script would.

Checking the system executable path would be a good supplement to support non-Python extensions, but entry points should have priority for Python tools.

(the "inline activity monitor" proposal and the "external command" proposal feel like they should be separate issues, though)