pyodide / pyodide-lock

Tooling to manage the `pyodide-lock.json` file
BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

Design CLI API #4

Open rth opened 1 year ago

rth commented 1 year ago

We need to design the CLI API for this package.

In https://github.com/pyodide/pyodide/issues/3573#issue-1580936524 @bollwyvl proposed,

$> pyodide-index path/to/wheels/folder
Wrote 200 packages to path/to/wheels/folder/repodata.json

and I agree this is the direction. Though given the current name of this package, it would also be more logical to call it pyodide lock IMO.

Also we need to keep in mind that the resulting lockfile would need to include information about the unvendored stdlib modules (and Python version). So it needs to have access to the original pyodide-lock.json (either via the pyodide version and looking at the CDN or by providing a path/URL to it). The difference with respect to conda index producing the repodata.json is that there,

There are two use cases,

  1. Adding/updating packages with actual files being stored on some remote CDN. In this case, extra entries in the pyodide-lock.json don't matter, since they would only load if explicitly imported, and we don't necessarily need to download all the included files locally IMO. Here I was thinking of taking something like a requirements.in as input (as in pip-tools) which would compute a consistent dependency graph merging the original pyodide-lock.json with the requirements in requirements.in and combine both (not easy).
  2. Including only a subset of packages for a given application, and ship them alongside pyodide-lock.json for reproducibility. This is closer to the use case of https://github.com/pyodide/pyodide-pack: BTW I'm changing a bit the focus of that package away from rather an experimental module stripping via runtime detection to any kind of package/wheel minification. So the wheel files would be modified by that tool, but the end pyodide-lock.json would be still generated by this project. The challenge with this use case is that even given a list of wheels in some folder, we still need to verify that there are no missing requirements and that the dependency graph is consistent, so we need a resolver that would understand the wasm platform for finding compatible wheels.

Anyway, it's still early day, this needs more discussion. My current idea is to iterate on the implementation that would work well in practice for these use cases, while only pushing alpha releases to PyPI. Any API in this package is considered unstable and can be completely changed.

Please let me know if you have any other ideas about how this should work.

@hoodmane @ryanking13

ryanking13 commented 1 year ago

Also we need to keep in mind that the resulting lockfile would need to include information about the unvendored stdlib modules (and Python version).

This is something that continues to bug me: there are modules that are required to be included, and because of that, when creating a lockfile externally, there is a dependency on the original lockfile or Pyodide.

I'd prefer the second option, to create a separate lockfile from the original pyodide-lock file, but I don't have any concrete ideas, and I suspect that this would cause version conflicts between duplicate packages.

bollwyvl commented 1 year ago

The way the in-flight jupyterlite PR works is by:

The concrete things this solves there:

no missing requirements and that the dependency graph is consistent,

Nobody likes building another package manager, of course. This is a place where a JSON schema can't do the job, but the more declarative options are... heavy. In the above PR, i opted for is there a missing named dependency, but fully validating the whole smorgasbord of "semver" operators would all but certainly entail another dependency, e.g. dparse.

rth commented 1 year ago

Thanks for the feedback @bollwyvl !

I'm opened more focused follow up issues, where each potential approach can be discussed in more details, so we can choose which way we go,

joemarshall commented 1 year ago

As an absolutely minimal first step, it would be good to just enable dependency fixup, as it is done in micropip now - so that if you had created a folder of all the wheels and dependencies you want in the json, it can add them to the json with correct dependencies.

That way it would fit with the existing pyodide-build support for building modules with dependencies included. Gives an initial workflow for making pyodide-lock.json for arbitrary modules and deps with current tools.

rth commented 1 year ago

Yes, I agree we could start with that.

joemarshall commented 1 year ago

Take a look at this PR - #20

With that PR, If you build a bunch of wheels using pyodide build --with-dependencies, then make a lockfile with pyodide lockfile add-wheels dist/*.whl, you have a lockfile which should work nicely in pyodide (with dependencies resolving nicely etc.).