pyodide / micropip

A lightweight Python package installer for Pyodide
https://micropip.pyodide.org
Mozilla Public License 2.0
68 stars 16 forks source link

`micropip.install` is mishandling wheel tags #32

Closed mentalisttraceur closed 1 year ago

mentalisttraceur commented 1 year ago

🐛 Bug

micropip.install does not reliably select the best-matching tagged wheel.

To Reproduce

  1. Open the official website's Pyodide REPL

  2. Use micropip to install a package with multiple pure Python 3 wheels. For example, my compose package has three (one for older Pythons, one for 3.5+ with async/await features added, and one for 3.8+ to use native PEP-570 positional-only arguments):

    >>> import micropip
    >>> await micropip.install('compose')
  3. Try importing acompose. If the py2.py30-tagged wheel got installed, this will fail:

    >>> from compose import acompose
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
    ImportError: cannot import name 'acompose' from 'compose' (/lib/python3.10/site-packages/compose.py)

Expected behavior

The wheel tagged py38 gets installed (since Pyodide in that REPL is 3.10 at this time), import of acompose succeeds, and inspecting the signature of acompose.__call__ shows that PEP-570 positional-only syntax is being used:

>>> from compose import acompose
>>> help(acompose.__call__)
Help on function __call__ in module compose:

async __call__(self, /, *args, **kwargs)
    Call the composed function.

>>> 
rth commented 1 year ago

Thanks for the report! So what seems to be happening currently, is that we take the list of available builds from PyPi, and take the first matching filename / tag https://github.com/pyodide/micropip/blob/main/micropip/_micropip.py#L277 while what we should be doing is finding the index of the tag in the list of the compatible tags https://github.com/pypa/packaging/issues/636#issuecomment-1344054827 (https://packaging.pypa.io/en/stable/tags.html#packaging.tags.sys_tags), then taking the best tag (i.e. lower index).

Would you be interested in looking into it @mentalisttraceur ?

mentalisttraceur commented 1 year ago

Thanks! And yeah, I already started looking into it when I reported it. Now in part thanks to the links you just shared I have enough clarity to see a solution.

I am about to go to sleep but I can open a PR when I wake up. In the meantime, feel free to give initial feedback here. The least invasive change that I feel good about would be to add a new method to class WheelInfo (and if no other code is using is_compatible, we could just remove it):

def best_compatible_tag_index(self):
    for index, tag in enumerate(sys_tags()):
        if tag in self.tags:
            return index
    return float('infinity')

and then use it like this:

+    best_wheel = None
+    best_wheel_tag_index = float('infinity')
+
     for ver in candidate_versions:
         if str(ver) not in releases:
             pkg_name = metadata.get("info", {}).get("name", "UNKNOWN")
             warnings.warn(
                 f"The package '{pkg_name}' contains an invalid version: '{ver}'. This version will be skipped"
             )
             continue

         release = releases[str(ver)]
         for fileinfo in release:
             url = fileinfo["url"]
             if not url.endswith(".whl"):
                 continue
             wheel = WheelInfo.from_url(url)
-            if wheel.is_compatible():
+            best_tag_index = wheel.best_compatible_tag_index()
+            if best_tag_index < best_wheel_tag_index:
+                best_wheel = wheel
+                best_wheel_tag_index = best_tag_index
                 wheel.digests = fileinfo["digests"]
-                return wheel
+
+    if best_wheel is not None:
+        return best_wheel
rth commented 1 year ago

Thanks! Yes, I was thinking along similar lines.

Maybe one minor suggestion, to have

def best_compatible_tag_index(self) -> float | None:

and return None if it's not compatible, which would be better API wise IMO. But then yes you need to do an extra check for None in the loop.

You would also need to add a unit test for this (and a changelog entry).

mentalisttraceur commented 1 year ago

Sounds good! (Yeah I was on the fence about None vs "infinity". Infinity does a better job in the loop of saying "this sorts higher than any index" but None does a better job of saying "there is no index" in the function return value - I agree, the clarity of saying "there is no index" with None is nicer at an API boundary.)

Anyway, implemented, change-logged, and tested. There were some "it was like that when I got here" test failures, but everything I touched is passing tests, and I did try to make the tests thorough. Further comments in the PR.