Closed pfmoore closed 1 year ago
Very interesting approach.
There will be teaching implications too (undoing years of python -m pip
).
The script will need to check python version compatibility.
There will be teaching implications too
At least initially, this can be an alternative, rather than a replacement. But absolutely, this is a significant change in approach. Which is why I think it needs to be flagged in advance. I'd post a topic on the Packaging discourse right now, but I'm frankly scared of the controversy it'll probably cause 😨
I mean, we can also ship pip as a zipapp. IIUC, that should still not be visible on pip list
and, it's literally a python pip.pyz ...
which would be equivalent to python -m pip
.
That's easy to communicate as well. :)
More broadly though, I'm on board. :)
I mean, we can also ship pip as a zipapp.
This is true. I'm not sure we can simply zip up the pip directory and call it a zipapp, but we can certainly ship a zipapp containing the script I posted above plus a copy of pip.
Do we know if all of our dependencies work when shipped as a zipapp (I believe requests didn't like the certificate file being in a zip at one stage, but IIRC that's fixed now)? Also, does the mechanism we use for injecting pip into a build environment work from a zipapp?
Shiv gets round this by creating zipapps that extract themselves on first use. I don't know if we want to go that far.
Otherwise, the main things that annoy me about zipapps are (1) python pip.pyz
doesn't search PATH
, and (2) .pyz
files aren't registered on Windows to run from the command line by default (they need to be added to PATHEXT
), and even when they are we have the old problem that nothing but an exe file is a "first class citizen" 🙁
As an initial step in this direction, though, we could ship a .pyz
- virtualenv does it, and I'm pretty sure a couple of other tools do as well, so it's not an unfamiliar model to people. We could then promote the idea as "if you don't want to install pip in all of your environments, you can use the zipapp version (and use --no-pip
when creating virtualenvs").
That's something I'd be comfortable announcing as a plan on Discourse...
I think the main hurdle toward shipping a standalone application (versus a zipapp) is source build. If someone needs to build something from source, it's likely they'll want to build against an existing Python installation, instead of the interpreter bundled in the standalone executable, and that'll need some additional mechanism.
Wheel-only installations should be more or less plausible. The only reason thing we need to deal with (that I can think of) is console script shebangs.
The key here is that the standalone executable doesn't bundle an interpreter[^1]. That's basically what the /usr/bin/env python
shebang achieves. It runs the included pip in the environment's own Python.
[^1]: Or if it does, it executes pip with the installation interpreter, not the bundled one. But that's harder (not impossible, but a bit more fiddly).
How to upgrade pip is going to be a topic. pip install --upgrade pip
is not going to do what people expect.
If we want to be fancy, the script could have a mechanism to download the latest pip for the corresponding python version.
Initially, I'd prefer to just publish a zipapp at https://bootstrap.pypa.io, like virtualenv does. Users can download that to get the latest version. Maybe we could also also publish it as a github release for people who want a specific version. I'd leave installers and upgraders to the community to provide, if they want (on Windows, for example, scoop and chocolatey can handle this, and on Linux distro packagers fulfil that role, I guess).
Agreed that pip install --upgrade pip
will be confusing, but I'm not sure there's much we can do about that, apart from have a gradual transition. Maybe we could add a warning to pip so that if it detects that it's not running from the location that it will upgrade, we let the user know? That might be useful in any case, not just for this situation.
Noting this here, so that we don't forget -- we'd want to update the upgrade prompt, to be aware of the zipapp based workflow and behave differently. What that different behaviour should be is something I don't have an opinion on, and I don't intend to think about that until we get somewhere in the discussion. :)
An interesting future capability could be that pip would no longer have to vendor as it could be isolated from the target environment,
I don't think we'd get to that point, not in the order of decades -- we're still going to allow installing pip in environments, so the core reasons for vendoring will continue to exist.
Agreed. A zipapp version of pip could debundle, but there's no point unless we drop support for installing pip in environments.
I don't think we could debundle even in the zipapp -- it'd still be possible to have a version of requests/urllib3 (for example) in the environment that won't work with whatever version of pip is being used via a zipapp.
For what it's worth, I've just created https://github.com/pfmoore/runpip
The build script is there, and I've published a 22.1.2 release that has the pyz as a downloadable asset. If people want to play with it, go ahead. I think I'm going to make it my default pip locally and see how that works out.
I don't think we could debundle even in the zipapp
Ah, I was thinking of debundling but still shipping all of the vendored libraries in the zipapp. Yeah, working with locally installed copies of our dependencies is never going to work.
Yea, I'm not sure what would take precedence in the import paths -- but we know vendoring works and we need it for our primary usecase today anyway. Let's table this -- we're all on the same page I think. :)
I just added an option to the test suite to run pip from a zipapp (specifically, script.pip
runs the zipapp, not the installed pip). For the integration tests[^1], I got
69 failed, 775 passed, 38 skipped, 6 xfailed, 2015 warnings
Not that bad, actually. And from a quick scan, many of the failures look like either assumptions about the location of the running pip, or "unexpected changes" caused by the extraction of cacert.pem to a temporary directory. So overall, that's relatively strong evidence that the zipapp is functional. At some point I'll try to work through the test failures, but for now I don't consider passing the test suite to be a necessary condition for publishing an experimental zipapp, if we choose to do so. Does anyone disagree?
Edit: FWIW, without using the zipapp, I get the following on my machine:
10 failed, 834 passed, 38 skipped, 6 xfailed, 2015 warnings
I believe the 10 failures are due to git on my PC being configured with init.defaultBranch=main
and some "filename too long" errors. So 59 possible issues to investigate and confirm that it's the test, not the zipapp, that's at fault.
[^1]: I assume the unit tests probably don't use script.pip
much, if at all.
Most of the rest are down to the unexpected existence of cacert.pem
in the temporary directory. I fixed this by allowing scripttest to ignore that file when running from a zipapp.
I'm going to finish on this for today, but I think we're most of the way there now.
The biggest outstanding task is working out a way to automatically build an up to date zipapp when running the tests with --use-zipapp
. For that, I ideally need to be able to build a wheel of the pip code under test. Of course, I don't want to build that wheel with the pip under test itself, in case it's broken... And looking at the test suite, I'm not even 100% sure I know of a reliable way of finding the code under test - the only copy I think I can rely on existing is the one installed in the test environment's site-packages. I suppose I could read all the installed files by starting from pip.__file__
, but that seems pretty awful...
Does anyone know a good way of building a wheel of the pip under test from the running test suite? Am I overthinking this, and there's a simple answer I'm missing?
- I assume the unit tests probably don't use
script.pip
much, if at all.
They're not allowed to. :)
I was nerd sniped by this, so I created https://github.com/sbidoul/pip-launcher, which automatically downloads the correct pip version using get-pip.py
(python 2.7+). I've symlinked that as pip
in my PATH and I'll see how it goes.
[update] renamed from pip-script to pip-launcher
lol, nice. We're going to end up with a whole raft of different approaches to running pip without installing it. I have had pip.pyz installed as pip in my path for about a week now, but I think that in order to get a proper feel for how well it works, I need to configure virtualenv (and pew, if I can work out how to do that as well) to default to --no-pip --no-setuptools --no-wheel
. It's probably just some environment variables to set.
What I plan on doing over the next week sometime (it's been busy this week) is to put together a post on the packaging Discourse, saying something along the lines of
The pip team are experimenting with alternative deployment methods for pip, which avoid the need for pip to be installed in every environment. We're aware that this will be a pretty big change in what people can expect, as there is currently a strong assumption that pip will be available in every Python environment. So we'd be interested in any feedback on how this could affect people's workflows, or tools. To be clear, we're not expecting to change the official deployment method in the short term, but we will be offering (and supporting) other approaches, and we'd like to get a better feel of the impact so that we can determine how to plan the rollout and how to frame the announcements.
Does that seem OK to people? Do you want me to post a draft somewhere so that the @pypa/pip-committers can review the post before I make it?
I’m fine with this wording and don’t think to put this somewhere for edits. In any case, I don’t feel strongly about the phrasing of the post and am happy to defer to others on that. It might make sense to link to this issue as well — again, I trust your judgement on whether that’s useful.
probably just some environment variables to set.
Setting VIRTUALENV_NO_PIP
to 1
does the job.
Does that seem OK to people?
Fine with me. No need to review AFAIC.
FWIW, I've long thought it would be a great idea if pip stopped introspecting the current environment, and instead supported a CLI flag to target a specific environment (which then defaulted to which python
).
Doing that, would mean you could use something like pyoxidizer to ship a whole Python with pip, including things like statically compiled extensions and what not.
One thing we may need to consider is pip "plugins". We don't have those formalized today (although we may in the future), but some pip feature already try to import packages to enable themselves (such as the new truststore
feature flag, or keyring
support). So some mechanism to make additional packages available to the pip launcher may be necessary - it could be as simple as inserting them in sys.path too. Coming up with a good UX for that may be more challenging, though.
FWIW, I've long thought it would be a great idea if pip stopped introspecting the current environment
Agreed. I think there's an issue somewhere for this, but it's a more complex change. For now, I think a zipapp that runs in any environment is a useful starting point, as it breaks the implication that pip is present in every environment (which I suspect will be the big hurdle for some people).
This has been on my "if I get round to it" long term plan for ages, as well 🙂
some pip feature already try to import packages to enable themselves
Again, for now I'm personally fine with the idea that such packages need to be installed in the target environment (or the user sets up $PYTHONPATH
to make them importable from a shared location). At some stage, I think we need to bite the bullet and decide what we want to do about "plugins" (either pip features gated on the presence of certain modules, or fully-independent plugins) but again, that's a much bigger question.
to the pip launcher
However, be aware that I'm thinking here about the simple "zip all of pip up into a pip.pyz
" approach that I'm working on for inclusion in 22.3. A more full-featured "pip launcher" that adds features like enabling plugins, etc, could have a much more complex UI, but I'm not sure it's necessary at this point.
A couple of points I want to record here, so I don't forget them.
__main__.py
to extract a --python
option from the command line, and re-invoke itself with the specified Python interpreter. I don't know if that's worth it, though. Longer term I'd still rather that pip handled all schemes the same, whether they are "this environment", "another Python's environment", or something explicit like --target
.ensurepip
here. If "pip in a zipfile" is now supported, python -m ensurepip
could simply run the embedded copy of pip fom the zip. Which makes the name "ensurepip" silly, as it now runs pip rather than ensuring it's present. On the other hand, ensurepip could just contain a copy of pip.pyz
, and you could run pip by getting the path to that pyz
via importlib.resources
. That's significant breakage, though.Running pip from the bundled wheel in ensurepip is as easy as:
import importlib.resources as r
import sys
import runpy
import ensurepip._bundled
for f in r.files(ensurepip._bundled).iterdir():
if f.name.startswith("pip") and f.name.endswith(".whl"):
with r.as_file(f) as lib:
sys.path.insert(0, str(lib))
runpy.run_module("pip", run_name="__main__")
break
else:
print("Could not find pip")
Of course, even if we support running pip from a zip, it's still not technically supported to treat wheels as zipfiles that can be put on sys.path
. But as a transition, so we can provide a "run pip without installing" library that falls back to the bundled pip while allowing ensurepip
to transition from providing an API to simply bundling a zip, this might be a reasonable approach.
I'm starting to think this will require a (language) PEP, as ensurepip is going to be involved when we get to this point...
Another progress update:
python -m pip
is probably not acceptable. I think this is fair - we've spent too long educating people that this is the way to run pip, to simply change the message again._pip
, or pip_internal
) and will have no visible UI. The user visible one will be called pip
, and will only expose a __main__.py
, but will load and run the "internal" version. The magic will be that it can try different strategies for loading the internal module - from the environment (for backward compatibility), from the wheel bundled in ensurepip
(for slimline environments), or longer term from a "shared pip" location.pip
module gets added to the stdlib (once we've stabilised the search strategy). If it's not changing, there's no real need for the complicated ensurepip
mechanism. We could retain ensurepip
as simply a holder for the "stdlib supplied" copy of pip. This step would need a PEP, of course.In related news, I now have a prototype implementation of an approach that lets pip manage arbitrary environments[^1]. At the moment, it's just a proof that the idea works, there's still a lot of work to do on the UI and on testing it. This is somewhat orthogonal to the proposal above, as if we make "shared pip with python -m pip
still working" the model, there's a lot less value in pip managing other environments. But it's still useful, for cases when the target doesn't have a working Python, for example. And it acts as a much more capable version of --target
, --root
and --prefix
(upgrades, uninstall, and queries like list, all work correctly). I've opened a separate issue (#11307) for this.
[^1]: I was working on this for various reasons - the zipapp could do with it, hatch was considering bundling pip and use it to manage other environments, and it would be useful for managing a "shared pip" in the model above.
FWIW, this is extremely close to the pip-cli model that we'd discussed in another issue. (I think about deprecating the various script wrappers)
It is, yes (#3164 to be specific). The main difference is that we'd want pip-cli
to supply the __main__.py
as well. We could do that (pip-cli
could install pip/__main__.py
) but that feels like we'd be starting to indulge in the sort of complicated hacks that we keep telling other people not to do 😕
While we are brainstorming, something comes to my mind... would it make any sense for ensurepip
(or python -m venv?) to install pip by placing a .pth
in the target environment (pointing to the wheel, a zip, or an unpacked copy of pip) ? That would
python -m pip
pip
scriptsto install pip by placing a .pth in the target environment
... or do an editable install of pip.
I'm not keen on an editable install. It feels like a misuse of the feature.
I’m reading this all and it’s still unclear to me what exactly we want the experience to be for a standalone pip.
My two cents: Let’s not think about all the ways the standalone pyz can be used to change/improve things until we have it working and working well “in the wild”. Ensurepip, the CLI script and everything else can be changed later and certainly is not needed for a first iteration.
Beyond that, I don’t think we should be changing the way python -m pip
or the way that the pip
script works as part of this. I also don’t think we should modify how ensurepip
works — we can revisit if it should get migrated later, but let’s keep the scope simple.
In other words, I’d like us to be cautious and not get ahead of ourselves here. Let’s start shipping a pyz file, that folks can download and use. Once we get feedback of all the fun ways it’s broken and we fix those, then we can start thinking about if/where it makes sense to use it.
The things mentioned in the most recent comments are all things we can only change once we have a polished experience for using the standalone mode.
Running pip from the bundled wheel in ensurepip is as easy as
Yes, and we explicitly tell people to not do that. I actually have been wondering if we should remove the root-is-purelib style functionality from wheels entirely; when we do a follow up with compression improvements.
That makes sense.
IMO, the remaining steps to get the pyz shipped are:
I won't merge any of this until @sbidoul is finished with 22.2. My target is 22.3 for the zipapp to be made official.
Once we get feedback of all the fun ways it’s broken and we fix those, then we can start thinking about if/where it makes sense to use it.
Things that come to mind as rough edges:
UpgradePrompt
class should be "zipapp-aware"?Yes, and we explicitly tell people to not do that.
I know. Ensurepip (and get-pip) are special cases. And unsupported hacks as a proof of concept are OK 😉 TBH, once we have a supported zipapp version of pip, a lot of the complexity in ensurepip
and get-pip.py
goes away - and cleaning that up would be a good thing. But it absolutely can wait until the zipapp's out there.
I do think it would be extremely useful to do some of the brainstorming around this[^1] (do we drop the versioned commands, what do we do about the unversioned pip command, can we reduce the footprint of what pip puts in a virtual environment, etc) but issues are a lousy way to do it. Face to face is good, but very hard to organise (even via video calls). Maybe some sort of shared design document that can be edited "live"? I don't know, TBH.
[^1]: In general, design directions and goals for pip would benefit from this. How do other projects handle this sort of high-level strategy planning?
IMO, the remaining steps to get the pyz shipped are: ...
- Some documentation. I will add this once we have the details all agreed.
Is step 3 the only thing missing at this point? https://bootstrap.pypa.io/pip/pip.pyz exists and seems to be getting updated with pip releases, so can it be relied on at this point?
Yes.
There is documentation here. The only question IMO is whether we are ready to remove the “experimental” status. Should I do that for 23.1?
I just find a tool https://github.com/sourcesimian/pyBake , which can bundle a python project to to single python file.
I give try it locally and looks like it working fine without any modification (except change vendor import path to the package). Don't know if you are instersted in shiping a bundled single pip.py file (little like what we did in get-pip), this also simplify the vendor development step
With the documented approach for a standalone copy of pip, I don't think there's more to do here.
We can remove the experimental label when we have some evidence that there's sufficient downloads/usage to justify that.
We can remove the experimental label when we have some evidence that there's sufficient downloads/usage to justify that.
Are download stats available for bootstrap.pypa.io? If not, then I'm not sure how we'd be able to determine this (so I'd be more inclined to just declare it as no longer experimental in 23.2 and be done with it).
We are planning on using the .pyz
file to bootstrap a pip install into virtual environments on systems lacking pip in VS Code (e.g., Debian/Ubuntu). We can let you know how that goes if you want the data point to decide whether to remove the experimental label.
Actually, it occurred to me that we may even be able to do this right now. I put together a very simple proof of concept and it seems to work. If you put the following script alongside a "lib" directory with pip installed into it (
pip install pip --target lib
) but with thebin
andpip*.dist-info
directory removed (so the bundled pip isn't visible inpip list
) then it can be run from any Python interpreter to effectively act as a copy of pip in that environment.I don't think it would take much to turn this into a viable "standalone pip" application (I'd mostly just want to set up an executable wrapper for Windows). I've done some very basic testing - this would need a lot more real-world testing to make sure there aren't any problem edge cases, but it basically seems to work.
Originally posted by @pfmoore in https://github.com/pypa/pip/issues/11223#issuecomment-1179518843
For now, this is just a placeholder to discuss whether we want to do this at all, or how we'd distribute it. The main point here is that with a script like this, there would no longer be a need to install pip in every virtual environment.
One thing we'd have to work out is what tools assume that pip is available in every environment. I'm thinking of environment managers and IDEs, like nox, or VS Code. The ecosystem implications here are likely to be more complicated than the technical issues. Maybe we need to start with a heads-up discussion on Discourse? But before we do that I'd like to make sure the pip committers are all on board with the idea...