pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

Ship pip as a standalone application #11243

Closed pfmoore closed 1 year ago

pfmoore commented 2 years ago

Actually, it occurred to me that we may even be able to do this right now. I put together a very simple proof of concept and it seems to work. If you put the following script alongside a "lib" directory with pip installed into it (pip install pip --target lib) but with the bin and pip*.dist-info directory removed (so the bundled pip isn't visible in pip list) then it can be run from any Python interpreter to effectively act as a copy of pip in that environment.

#!/usr/bin/env python

import runpy
import sys
import os

lib = os.path.join(os.path.dirname(__file__), "lib")
sys.path.insert(0, lib)

runpy.run_module("pip", run_name="__main__")

I don't think it would take much to turn this into a viable "standalone pip" application (I'd mostly just want to set up an executable wrapper for Windows). I've done some very basic testing - this would need a lot more real-world testing to make sure there aren't any problem edge cases, but it basically seems to work.

Originally posted by @pfmoore in https://github.com/pypa/pip/issues/11223#issuecomment-1179518843

For now, this is just a placeholder to discuss whether we want to do this at all, or how we'd distribute it. The main point here is that with a script like this, there would no longer be a need to install pip in every virtual environment.

One thing we'd have to work out is what tools assume that pip is available in every environment. I'm thinking of environment managers and IDEs, like nox, or VS Code. The ecosystem implications here are likely to be more complicated than the technical issues. Maybe we need to start with a heads-up discussion on Discourse? But before we do that I'd like to make sure the pip committers are all on board with the idea...

sbidoul commented 2 years ago

Very interesting approach.

sbidoul commented 2 years ago

There will be teaching implications too (undoing years of python -m pip).

sbidoul commented 2 years ago

The script will need to check python version compatibility.

pfmoore commented 2 years ago

There will be teaching implications too

At least initially, this can be an alternative, rather than a replacement. But absolutely, this is a significant change in approach. Which is why I think it needs to be flagged in advance. I'd post a topic on the Packaging discourse right now, but I'm frankly scared of the controversy it'll probably cause 😨

pradyunsg commented 2 years ago

I mean, we can also ship pip as a zipapp. IIUC, that should still not be visible on pip list and, it's literally a python pip.pyz ... which would be equivalent to python -m pip.

That's easy to communicate as well. :)

pradyunsg commented 2 years ago

More broadly though, I'm on board. :)

pfmoore commented 2 years ago

I mean, we can also ship pip as a zipapp.

This is true. I'm not sure we can simply zip up the pip directory and call it a zipapp, but we can certainly ship a zipapp containing the script I posted above plus a copy of pip.

Do we know if all of our dependencies work when shipped as a zipapp (I believe requests didn't like the certificate file being in a zip at one stage, but IIRC that's fixed now)? Also, does the mechanism we use for injecting pip into a build environment work from a zipapp?

Shiv gets round this by creating zipapps that extract themselves on first use. I don't know if we want to go that far.

Otherwise, the main things that annoy me about zipapps are (1) python pip.pyz doesn't search PATH, and (2) .pyz files aren't registered on Windows to run from the command line by default (they need to be added to PATHEXT), and even when they are we have the old problem that nothing but an exe file is a "first class citizen" 🙁

As an initial step in this direction, though, we could ship a .pyz - virtualenv does it, and I'm pretty sure a couple of other tools do as well, so it's not an unfamiliar model to people. We could then promote the idea as "if you don't want to install pip in all of your environments, you can use the zipapp version (and use --no-pip when creating virtualenvs").

That's something I'd be comfortable announcing as a plan on Discourse...

uranusjr commented 2 years ago

I think the main hurdle toward shipping a standalone application (versus a zipapp) is source build. If someone needs to build something from source, it's likely they'll want to build against an existing Python installation, instead of the interpreter bundled in the standalone executable, and that'll need some additional mechanism.

Wheel-only installations should be more or less plausible. The only reason thing we need to deal with (that I can think of) is console script shebangs.

pfmoore commented 2 years ago

The key here is that the standalone executable doesn't bundle an interpreter[^1]. That's basically what the /usr/bin/env python shebang achieves. It runs the included pip in the environment's own Python.

[^1]: Or if it does, it executes pip with the installation interpreter, not the bundled one. But that's harder (not impossible, but a bit more fiddly).

sbidoul commented 2 years ago

How to upgrade pip is going to be a topic. pip install --upgrade pip is not going to do what people expect.

If we want to be fancy, the script could have a mechanism to download the latest pip for the corresponding python version.

pfmoore commented 2 years ago

Initially, I'd prefer to just publish a zipapp at https://bootstrap.pypa.io, like virtualenv does. Users can download that to get the latest version. Maybe we could also also publish it as a github release for people who want a specific version. I'd leave installers and upgraders to the community to provide, if they want (on Windows, for example, scoop and chocolatey can handle this, and on Linux distro packagers fulfil that role, I guess).

Agreed that pip install --upgrade pip will be confusing, but I'm not sure there's much we can do about that, apart from have a gradual transition. Maybe we could add a warning to pip so that if it detects that it's not running from the location that it will upgrade, we let the user know? That might be useful in any case, not just for this situation.

pradyunsg commented 2 years ago

Noting this here, so that we don't forget -- we'd want to update the upgrade prompt, to be aware of the zipapp based workflow and behave differently. What that different behaviour should be is something I don't have an opinion on, and I don't intend to think about that until we get somewhere in the discussion. :)

RonnyPfannschmidt commented 2 years ago

An interesting future capability could be that pip would no longer have to vendor as it could be isolated from the target environment,

pradyunsg commented 2 years ago

I don't think we'd get to that point, not in the order of decades -- we're still going to allow installing pip in environments, so the core reasons for vendoring will continue to exist.

pfmoore commented 2 years ago

Agreed. A zipapp version of pip could debundle, but there's no point unless we drop support for installing pip in environments.

pradyunsg commented 2 years ago

I don't think we could debundle even in the zipapp -- it'd still be possible to have a version of requests/urllib3 (for example) in the environment that won't work with whatever version of pip is being used via a zipapp.

pfmoore commented 2 years ago

For what it's worth, I've just created https://github.com/pfmoore/runpip

The build script is there, and I've published a 22.1.2 release that has the pyz as a downloadable asset. If people want to play with it, go ahead. I think I'm going to make it my default pip locally and see how that works out.

pfmoore commented 2 years ago

I don't think we could debundle even in the zipapp

Ah, I was thinking of debundling but still shipping all of the vendored libraries in the zipapp. Yeah, working with locally installed copies of our dependencies is never going to work.

pradyunsg commented 2 years ago

Yea, I'm not sure what would take precedence in the import paths -- but we know vendoring works and we need it for our primary usecase today anyway. Let's table this -- we're all on the same page I think. :)

pfmoore commented 2 years ago

I just added an option to the test suite to run pip from a zipapp (specifically, script.pip runs the zipapp, not the installed pip). For the integration tests[^1], I got

69 failed, 775 passed, 38 skipped, 6 xfailed, 2015 warnings

Not that bad, actually. And from a quick scan, many of the failures look like either assumptions about the location of the running pip, or "unexpected changes" caused by the extraction of cacert.pem to a temporary directory. So overall, that's relatively strong evidence that the zipapp is functional. At some point I'll try to work through the test failures, but for now I don't consider passing the test suite to be a necessary condition for publishing an experimental zipapp, if we choose to do so. Does anyone disagree?

Edit: FWIW, without using the zipapp, I get the following on my machine:

10 failed, 834 passed, 38 skipped, 6 xfailed, 2015 warnings

I believe the 10 failures are due to git on my PC being configured with init.defaultBranch=main and some "filename too long" errors. So 59 possible issues to investigate and confirm that it's the test, not the zipapp, that's at fault.

[^1]: I assume the unit tests probably don't use script.pip much, if at all.

pfmoore commented 2 years ago

11248 fixes one of the problems (28 failures), getting us down to 41 failures (31 if you ignore the 10 unrelated ones).

pfmoore commented 2 years ago

Most of the rest are down to the unexpected existence of cacert.pem in the temporary directory. I fixed this by allowing scripttest to ignore that file when running from a zipapp.

I'm going to finish on this for today, but I think we're most of the way there now.

The biggest outstanding task is working out a way to automatically build an up to date zipapp when running the tests with --use-zipapp. For that, I ideally need to be able to build a wheel of the pip code under test. Of course, I don't want to build that wheel with the pip under test itself, in case it's broken... And looking at the test suite, I'm not even 100% sure I know of a reliable way of finding the code under test - the only copy I think I can rely on existing is the one installed in the test environment's site-packages. I suppose I could read all the installed files by starting from pip.__file__, but that seems pretty awful...

Does anyone know a good way of building a wheel of the pip under test from the running test suite? Am I overthinking this, and there's a simple answer I'm missing?

pradyunsg commented 2 years ago
  1. I assume the unit tests probably don't use script.pip much, if at all.

They're not allowed to. :)

sbidoul commented 2 years ago

I was nerd sniped by this, so I created https://github.com/sbidoul/pip-launcher, which automatically downloads the correct pip version using get-pip.py (python 2.7+). I've symlinked that as pip in my PATH and I'll see how it goes.

[update] renamed from pip-script to pip-launcher

pfmoore commented 2 years ago

lol, nice. We're going to end up with a whole raft of different approaches to running pip without installing it. I have had pip.pyz installed as pip in my path for about a week now, but I think that in order to get a proper feel for how well it works, I need to configure virtualenv (and pew, if I can work out how to do that as well) to default to --no-pip --no-setuptools --no-wheel. It's probably just some environment variables to set.

What I plan on doing over the next week sometime (it's been busy this week) is to put together a post on the packaging Discourse, saying something along the lines of

The pip team are experimenting with alternative deployment methods for pip, which avoid the need for pip to be installed in every environment. We're aware that this will be a pretty big change in what people can expect, as there is currently a strong assumption that pip will be available in every Python environment. So we'd be interested in any feedback on how this could affect people's workflows, or tools. To be clear, we're not expecting to change the official deployment method in the short term, but we will be offering (and supporting) other approaches, and we'd like to get a better feel of the impact so that we can determine how to plan the rollout and how to frame the announcements.

Does that seem OK to people? Do you want me to post a draft somewhere so that the @pypa/pip-committers can review the post before I make it?

pradyunsg commented 2 years ago

I’m fine with this wording and don’t think to put this somewhere for edits. In any case, I don’t feel strongly about the phrasing of the post and am happy to defer to others on that. It might make sense to link to this issue as well — again, I trust your judgement on whether that’s useful.

sbidoul commented 2 years ago

probably just some environment variables to set.

Setting VIRTUALENV_NO_PIP to 1 does the job.

Does that seem OK to people?

Fine with me. No need to review AFAIC.

dstufft commented 2 years ago

FWIW, I've long thought it would be a great idea if pip stopped introspecting the current environment, and instead supported a CLI flag to target a specific environment (which then defaulted to which python).

Doing that, would mean you could use something like pyoxidizer to ship a whole Python with pip, including things like statically compiled extensions and what not.

sbidoul commented 2 years ago

One thing we may need to consider is pip "plugins". We don't have those formalized today (although we may in the future), but some pip feature already try to import packages to enable themselves (such as the new truststore feature flag, or keyring support). So some mechanism to make additional packages available to the pip launcher may be necessary - it could be as simple as inserting them in sys.path too. Coming up with a good UX for that may be more challenging, though.

pfmoore commented 2 years ago

FWIW, I've long thought it would be a great idea if pip stopped introspecting the current environment

Agreed. I think there's an issue somewhere for this, but it's a more complex change. For now, I think a zipapp that runs in any environment is a useful starting point, as it breaks the implication that pip is present in every environment (which I suspect will be the big hurdle for some people).

This has been on my "if I get round to it" long term plan for ages, as well 🙂

some pip feature already try to import packages to enable themselves

Again, for now I'm personally fine with the idea that such packages need to be installed in the target environment (or the user sets up $PYTHONPATH to make them importable from a shared location). At some stage, I think we need to bite the bullet and decide what we want to do about "plugins" (either pip features gated on the presence of certain modules, or fully-independent plugins) but again, that's a much bigger question.

to the pip launcher

However, be aware that I'm thinking here about the simple "zip all of pip up into a pip.pyz" approach that I'm working on for inclusion in 22.3. A more full-featured "pip launcher" that adds features like enabling plugins, etc, could have a much more complex UI, but I'm not sure it's necessary at this point.

pfmoore commented 2 years ago

A couple of points I want to record here, so I don't forget them.

  1. It's possible to modify the zipapp __main__.py to extract a --python option from the command line, and re-invoke itself with the specified Python interpreter. I don't know if that's worth it, though. Longer term I'd still rather that pip handled all schemes the same, whether they are "this environment", "another Python's environment", or something explicit like --target.
  2. It's possible to ship a small Python library that hides all of this, and invokes pip via whatever shenanigans are needed, and tools can depend on that, rather than on pip.
  3. We should consider the impact on ensurepip here. If "pip in a zipfile" is now supported, python -m ensurepip could simply run the embedded copy of pip fom the zip. Which makes the name "ensurepip" silly, as it now runs pip rather than ensuring it's present. On the other hand, ensurepip could just contain a copy of pip.pyz, and you could run pip by getting the path to that pyz via importlib.resources. That's significant breakage, though.
pfmoore commented 2 years ago

Running pip from the bundled wheel in ensurepip is as easy as:

import importlib.resources as r
import sys
import runpy
import ensurepip._bundled

for f in r.files(ensurepip._bundled).iterdir():
    if f.name.startswith("pip") and f.name.endswith(".whl"):
        with r.as_file(f) as lib:
            sys.path.insert(0, str(lib))
            runpy.run_module("pip", run_name="__main__")
        break
else:
    print("Could not find pip")

Of course, even if we support running pip from a zip, it's still not technically supported to treat wheels as zipfiles that can be put on sys.path. But as a transition, so we can provide a "run pip without installing" library that falls back to the bundled pip while allowing ensurepip to transition from providing an API to simply bundling a zip, this might be a reasonable approach.

I'm starting to think this will require a (language) PEP, as ensurepip is going to be involved when we get to this point...

pfmoore commented 2 years ago

Another progress update:

  1. I think the biggest message from the Discourse thread is that breaking python -m pip is probably not acceptable. I think this is fair - we've spent too long educating people that this is the way to run pip, to simply change the message again.
  2. I'd still like us to support running pip from a zip file. There are a number of good uses for this (and a standalone zipapp remains one of them for people who know what they are doing) so I'd like to support it.
  3. It may be worth splitting pip into two packages. The main one will contain all of the current code, but will install to some other name (maybe _pip, or pip_internal) and will have no visible UI. The user visible one will be called pip, and will only expose a __main__.py, but will load and run the "internal" version. The magic will be that it can try different strategies for loading the internal module - from the environment (for backward compatibility), from the wheel bundled in ensurepip (for slimline environments), or longer term from a "shared pip" location.
  4. Even longer term, we could propose that the front-end pip module gets added to the stdlib (once we've stabilised the search strategy). If it's not changing, there's no real need for the complicated ensurepip mechanism. We could retain ensurepip as simply a holder for the "stdlib supplied" copy of pip. This step would need a PEP, of course.

In related news, I now have a prototype implementation of an approach that lets pip manage arbitrary environments[^1]. At the moment, it's just a proof that the idea works, there's still a lot of work to do on the UI and on testing it. This is somewhat orthogonal to the proposal above, as if we make "shared pip with python -m pip still working" the model, there's a lot less value in pip managing other environments. But it's still useful, for cases when the target doesn't have a working Python, for example. And it acts as a much more capable version of --target, --root and --prefix (upgrades, uninstall, and queries like list, all work correctly). I've opened a separate issue (#11307) for this.

[^1]: I was working on this for various reasons - the zipapp could do with it, hatch was considering bundling pip and use it to manage other environments, and it would be useful for managing a "shared pip" in the model above.

pradyunsg commented 2 years ago

FWIW, this is extremely close to the pip-cli model that we'd discussed in another issue. (I think about deprecating the various script wrappers)

pfmoore commented 2 years ago

It is, yes (#3164 to be specific). The main difference is that we'd want pip-cli to supply the __main__.py as well. We could do that (pip-cli could install pip/__main__.py) but that feels like we'd be starting to indulge in the sort of complicated hacks that we keep telling other people not to do 😕

sbidoul commented 2 years ago

While we are brainstorming, something comes to my mind... would it make any sense for ensurepip (or python -m venv?) to install pip by placing a .pth in the target environment (pointing to the wheel, a zip, or an unpacked copy of pip) ? That would

sbidoul commented 2 years ago

to install pip by placing a .pth in the target environment

... or do an editable install of pip.

pfmoore commented 2 years ago

I'm not keen on an editable install. It feels like a misuse of the feature.

pradyunsg commented 2 years ago

I’m reading this all and it’s still unclear to me what exactly we want the experience to be for a standalone pip.

My two cents: Let’s not think about all the ways the standalone pyz can be used to change/improve things until we have it working and working well “in the wild”. Ensurepip, the CLI script and everything else can be changed later and certainly is not needed for a first iteration.

Beyond that, I don’t think we should be changing the way python -m pip or the way that the pip script works as part of this. I also don’t think we should modify how ensurepip works — we can revisit if it should get migrated later, but let’s keep the scope simple.

In other words, I’d like us to be cautious and not get ahead of ourselves here. Let’s start shipping a pyz file, that folks can download and use. Once we get feedback of all the fun ways it’s broken and we fix those, then we can start thinking about if/where it makes sense to use it.

The things mentioned in the most recent comments are all things we can only change once we have a polished experience for using the standalone mode.

pradyunsg commented 2 years ago

Running pip from the bundled wheel in ensurepip is as easy as

Yes, and we explicitly tell people to not do that. I actually have been wondering if we should remove the root-is-purelib style functionality from wheels entirely; when we do a follow up with compression improvements.

pfmoore commented 2 years ago

That makes sense.

IMO, the remaining steps to get the pyz shipped are:

  1. Merge the change (#11250) to add pyz testing to the CI. This is ready to go, as long as we're OK with the increased CI time.
  2. Add the pyz generation to the get-pip repo (https://github.com/pypa/get-pip/pull/158). I need a bit of advice on this one, as I don't really understand the deployment scripts in the get-pip repo, nor how to run the scripts locally to check them.
  3. Some documentation. I will add this once we have the details all agreed.

I won't merge any of this until @sbidoul is finished with 22.2. My target is 22.3 for the zipapp to be made official.

Once we get feedback of all the fun ways it’s broken and we fix those, then we can start thinking about if/where it makes sense to use it.

Things that come to mind as rough edges:

Yes, and we explicitly tell people to not do that.

I know. Ensurepip (and get-pip) are special cases. And unsupported hacks as a proof of concept are OK 😉 TBH, once we have a supported zipapp version of pip, a lot of the complexity in ensurepip and get-pip.py goes away - and cleaning that up would be a good thing. But it absolutely can wait until the zipapp's out there.

I do think it would be extremely useful to do some of the brainstorming around this[^1] (do we drop the versioned commands, what do we do about the unversioned pip command, can we reduce the footprint of what pip puts in a virtual environment, etc) but issues are a lousy way to do it. Face to face is good, but very hard to organise (even via video calls). Maybe some sort of shared design document that can be edited "live"? I don't know, TBH.

[^1]: In general, design directions and goals for pip would benefit from this. How do other projects handle this sort of high-level strategy planning?

brettcannon commented 1 year ago

IMO, the remaining steps to get the pyz shipped are: ...

  1. Some documentation. I will add this once we have the details all agreed.

Is step 3 the only thing missing at this point? https://bootstrap.pypa.io/pip/pip.pyz exists and seems to be getting updated with pip releases, so can it be relied on at this point?

pradyunsg commented 1 year ago

Yes.

pfmoore commented 1 year ago

There is documentation here. The only question IMO is whether we are ready to remove the “experimental” status. Should I do that for 23.1?

trim21 commented 1 year ago

I just find a tool https://github.com/sourcesimian/pyBake , which can bundle a python project to to single python file.

I give try it locally and looks like it working fine without any modification (except change vendor import path to the package). Don't know if you are instersted in shiping a bundled single pip.py file (little like what we did in get-pip), this also simplify the vendor development step

pradyunsg commented 1 year ago

With the documented approach for a standalone copy of pip, I don't think there's more to do here.

We can remove the experimental label when we have some evidence that there's sufficient downloads/usage to justify that.

pfmoore commented 1 year ago

We can remove the experimental label when we have some evidence that there's sufficient downloads/usage to justify that.

Are download stats available for bootstrap.pypa.io? If not, then I'm not sure how we'd be able to determine this (so I'd be more inclined to just declare it as no longer experimental in 23.2 and be done with it).

brettcannon commented 1 year ago

We are planning on using the .pyz file to bootstrap a pip install into virtual environments on systems lacking pip in VS Code (e.g., Debian/Ubuntu). We can let you know how that goes if you want the data point to decide whether to remove the experimental label.