Open hoodmane opened 1 month ago
Indeed we need some more robust test for micropip.freeze, especially its compatibility with pyodide-lock (https://github.com/pyodide/micropip/issues/88).
By the way, is it possible to install fastapi in Pyodide, without skipping some packages? When I try to install it in the browser, I get: Can't find a pure Python 3 wheel for 'ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1'.
Are you on latest? It works in pip.
@jaraco this might be a good first issue.
I'm getting started with this issue. Attempting to follow the repro instructions, I first don't want to run pip install pyodide-build
, because that implies installing to a system Python or another virtualenv. So instead, I've found I can install using pipx
but I need to supply the --include-deps
in order to get the pyodide
command (and not just the pyodide-build
command).
Next, I tried to create the virtualenv, but it fails:
draft @ pyodide venv .venv
xbuild environment already exists, skipping download
Installing xbuild environment
Creating Pyodide virtualenv at .venv
Expected host Python version to be 3.11 but got version 3.12
draft @ _.rtn
1
It wasn't obvious to me at first the command had failed. The message
Expected host Python version to be 3.11 but got version 3.12
looks more like a warning or informational message than an error. This does remind me that I need to work on my shell prompt to show when the last command was a failure.
So it seems I need to install pyodide using Python 3.11. I used pipx to uninstall and then reinstalled with pipx install pyodide-build --include-deps --python 3.11
and now I'm able to create the virtualenv. \o/
After installing the virtualenv, I find that it doesn't honor pylauncher as I'd expect. Pylauncher does launch the Python in the virtualenv:
draft @ py -c 'import sys; print(sys.prefix)'
/Users/jaraco/draft/.venv
But when invoking pip, it seems to try to target the system environment:
draft @ py -m pip install fastapi
error: externally-managed-environment
× This environment is externally managed
╰─> ...
Read more about this behavior here: <https://peps.python.org/pep-0668/>
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
The way pylauncher works is it's supposed to use .venv
if it exists, and it's doing that:
draft @ cat foo.py
import sys
print(sys.prefix)
draft @ py -m foo
/Users/jaraco/draft/.venv
but for some reason pip doesn't honor that expectation. That may be a separate issue deserving its own investigation. In the meantime, I'll invoke pip explicitly.
Edit: Invoking .venv/bin/python -m pip
has the same behavior, so it seems that it's necessary to invoke .venv/bin/pip
directly.
Invoking pip and python explicitly, I'm able to replicate the failed expectation:
draft @ .venv/bin/python -c "import micropip; print(micropip.freeze())" | jq .packages["fastapi"]
null
Attempting to editable-install micropip to the target environment, I encountered another issue. The metadata fails to uninstall due to pip not finding the required 'name' metadata.
If I apply this diff:
micropip main @ git diff
diff --git a/micropip/_commands/freeze.py b/micropip/_commands/freeze.py
index 4a49dc5..6c2804a 100644
--- a/micropip/_commands/freeze.py
+++ b/micropip/_commands/freeze.py
@@ -26,11 +26,7 @@ def freeze() -> str:
name = dist.name
version = dist.version
url = dist.read_text("PYODIDE_URL")
- if url is None:
- continue
-
sha256 = dist.read_text("PYODIDE_SHA256")
- assert sha256
imports = (dist.read_text("top_level.txt") or "").split()
requires = dist.read_text("PYODIDE_REQUIRES")
if not requires:
FastAPI then appears in the output:
At first blush, things seem to be working as intended. The docs for micropip.freeze
say:
load packages that were loaded with
micropip
this time.
But since the fastapi package wasn't loaded with micropip
, it's probably no surprise that it doesn't have the pyodide-specific metadata.
Then again, I don't yet understand what micropip is or what interactions it has with pip. Clearly many of the packages are getting pyodide-specific metadata, so I need to learn how it is that micropip is affecting a pip install.
I read the readme and docs, but they don't explain how micropip works except from a user standpoint, so I'm starting to inspect the code and the environment to understand how pyodide/micropip works.
I can see that the file at .venv/bin/pip
has been altered by pyodide to provide alternate behavior. That also partly explains why .venv/bin/python -m pip
doesn't work as expected.
@hoodmane Can you direct me to where the behavior deviates from the default pip
behavior (i.e. how does pyodide hook into pip to cause the pyodide metadata to be installed)?
Oh! I just noticed that none of the packages installed by pip are showing up in micropip.freeze()
. And in fact, if I add the following breakpoint, it's never reached:
diff --git a/micropip/_commands/freeze.py b/micropip/_commands/freeze.py
index 4a49dc5..23b4bbb 100644
--- a/micropip/_commands/freeze.py
+++ b/micropip/_commands/freeze.py
@@ -28,6 +28,8 @@ def freeze() -> str:
url = dist.read_text("PYODIDE_URL")
if url is None:
continue
+ else:
+ breakpoint()
sha256 = dist.read_text("PYODIDE_SHA256")
assert sha256
That is, not a single distribution has the expected metadata. The sole output of micropip.freeze()
is the contents of REPODATA_PACKAGES
.
So that begs the question - what isn't working? What is it you expect from micropip.freeze()
when packages were installed by pip?
We also have https://github.com/pyodide/pyodide-lock, so maybe this "create a lock file from the pip environment" could go there.
But I'd think we could adjust micropip.freeze()
to include the pip installed packages. Maybe this is overly optimistic.
The problem is that micropip is not a good solve algorithm, and pip is unable to create the lockfile we want. So it makes sense to try and make the two get along so we can get the solve from pip and the lockfile from micropip.
Ah yes the logic does seem to be rather intentionally ignoring externally installed packages. Maybe we can try a bit harder here. In this situation we have two indexes.
Our jsdelivr index is all in REPODATA_PACKAGES
. So any other packages must be from pypi.
But unfortunately, it doesn't seem that we were meant to be able to lock a virtual environment to the urls so maybe this idea is a nonstarter.
But if it did come from pypi and we know which version it is then we have package==version
and we could hit pypi again to find the url...
We need a way to run a good solver and then make a lockfile from it. This workflow may not be the best way.
Is there a spec or description of what the lockfile requirements are? I see in pyodide_lock.spec, the PackageSpec indicates that file_name must be a string. Could that string be empty? Does it need to exist? What would fail if it doesn't exist? What would happen if it points to a file that doesn't actually match the installed package?
There is uh no written spec. We should write one and probably add it somewhere to the Pyodide packaging docs.
The situation with file_name
is that it is all based around this line here:
https://github.com/pyodide/pyodide/blob/main/src/js/load-package.ts?plain=1#L310
installBaseUrl
is generally https://cdn.jsdelivr.net/pyodide/0.25.1/full/
and fileName
is something like numpy-1.26.4-cp312-cp312-pyodide_2024_0_wasm32.whl
.
Then people were installing wheels from other sources with micropip using its slow and incorrect resolver, I was upset by this and wanted a way to lock additional wheels. Because of the way resolvePath
works, if fileName
is a url, then installBaseUrl
is ignored and it returns fileName
. At least in the browser, there may be correctness bugs in node or other runtimes. Test coverage is incomplete here.
So after this the field contains either a file name if it's from the jsdelivr cdn or a fully qualified url. Working from an implementation detail of the installer. We should clean this up.
Could that string be empty? Does it need to exist? What would fail if it doesn't exist?
String cannot be empty in that case it will fetch the directory and jsdelivr will return a 404 or someone else's server may return some directory index. This is a special case of it not existing.
Depending on what exactly happens, either the fetch failing leads to the install failing, or we get an html response or some other thing that is not a zip archive and we try to unzip it and fail in unzip.
What would happen if it points to a file that doesn't actually match the installed package?
If it's a wheel that is something different, I am not sure. Sounds like an excellent test case! Most likely it just installs the wrong thing?
Working from an implementation detail of the installer.
Neat. Although it's probably unintended, I'm happy to see URL joins working the way they were intended to do so, allowing files to be resolved from a default location or from somewhere halfway around the world, on a whim.
Thanks for the explanation. I think I understand the problem pretty well now.
My first instinct is to compare these lock files against other lock files created in the Python ecosystem, such as with pip freeze
or piptools compile
, especially the former as its purpose seems congruent with what's going on here.
In the case of pip freeze, pip stores the name and version, but not the URL or hash. Piptools compile saves the name and version and hash, but not the URL (IIRC). Both approaches rely on the installer and defer resolving the URL to the installer at install time. Before we go to the trouble of implementing something that locks in the URL and hash, I'd like to ask ourselves if it might make sense to follow that model and require pyodide (or whatever installer consumes the lockfile) to use infrastructure (PyPI) to resolve the asset? Imagine, for example, that the lockfile has these fields:
source: PyPI
name: myproject
version: 1.0.0
# or maybe
spec: myproject==1.0.0
Then load-package.js could honor the source
and know to ignore file_name
and possibly ignore the hash. It would then at install time query PyPI to resolve the URL and hash (possibly verifying it).
This approach would allow freezing an environment with packages installed by pip or other standards-conforming installer.
It's a bit of a shame that PEP 610 specifies that direct_url.json is only for installs from a URL that weren't resolved through an index.
So as I see it, we have three main avenues to pursue:
Are there other factors to consider when choosing from these options?