pypa / packaging

Core utilities for Python packages
https://packaging.pypa.io/
Other
619 stars 248 forks source link

Marker referencing 'platform_release' fails to evaluate on Linux systems with non-PEP 440 kernel versions #774

Open diazona opened 9 months ago

diazona commented 9 months ago

In a conversation on Mastodon, @kevinbowen777 reported an error with a failing version comparison when installing py5 with pdm. With @py5coding and myself, we tracked the cause to markers like platform_release >= "20.0" and sys_platform == "darwin" that appear in the pdm lock file. Specifically, on Linux, platform_release evaluates to the full Linux kernel version as returned by uname -r, which might be something like 6.1.0-17-amd64 (on Kevin's system) or 6.7.0-gentoo (on mine), and so when packaging applies PEP 440 version comparison logic to that version number, it raises an InvalidVersion error.

Here's a simple reproduction on my system:

>>> import packaging.markers
>>> marker = packaging.markers.Marker('platform_release >= "20.0"')
>>> packaging.markers.default_environment()
{'implementation_name': 'cpython', 'implementation_version': '3.12.1', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_release': '6.7.0-gentoo', 'platform_system': 'Linux', 'platform_version': '#1 SMP PREEMPT_DYNAMIC Sun Jan 14 21:02:35 PST 2024', 'python_full_version': '3.12.1', 'platform_python_implementation': 'CPython', 'python_version': '3.12', 'sys_platform': 'linux'}
>>> m.evaluate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.12/site-packages/packaging/markers.py", line 252, in evaluate
    return _evaluate_markers(self._markers, current_environment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/packaging/markers.py", line 158, in _evaluate_markers
    groups[-1].append(_eval_op(lhs_value, op, rhs_value))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/packaging/markers.py", line 116, in _eval_op
    return spec.contains(lhs, prereleases=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/packaging/specifiers.py", line 568, in contains
    normalized_item = _coerce_version(item)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/packaging/specifiers.py", line 36, in _coerce_version
    version = Version(version)
              ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/packaging/version.py", line 200, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '6.7.0-gentoo'

I know it's not terribly common for packages to use the platform_release marker, but evidently it does happen; in this case when installing py5, something is conditionally depending on pyobjc and is using platform_release to express that condition. And I didn't find anything in the documentation indicating that this sort of thing shouldn't be allowed. So it certainly at least looks like an issue that needs to be worked around.

This is pretty similar to #678, but that issue was about the Python runtime version, where there's a bit of an argument to be made that CPython should stick to PEP 440-compatible version numbers. I'm pretty sure that argument isn't going to fly for distribution-packaged Linux kernel versions. I wasn't sure if you'd rather have a separate issue or just a comment on #678, but I figured I'd start with this and you can absorb it into that other issue if you want.


Original example with pdm and py5 ```console $ pdm init --python python3.12 --lib --backend pdm-backend --non-interactive Creating a pyproject.toml for PDM... Virtualenv is created successfully at /home/diazona/tmp/py5_test/.venv Project is initialized successfully $ pdm add --no-sync py5 Adding packages to default dependencies: py5 🔒 Lock successful Changes are written to pyproject.toml. $ pdm install -v STATUS: Resolving packages from lockfile... Traceback (most recent call last): File "/home/diazona/.local/bin/pdm", line 8, in sys.exit(main()) ^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/core.py", line 288, in main return Core().main(args or sys.argv[1:]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/core.py", line 208, in main raise cast(Exception, err).with_traceback(traceback) from None File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/core.py", line 203, in main self.handle(project, options) File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/core.py", line 157, in handle command.handle(project, options) File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/cli/commands/install.py", line 100, in handle actions.do_sync( File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/cli/actions.py", line 222, in do_sync candidates = resolve_candidates_from_lockfile(project, requirements, groups=list(selection)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/cli/actions.py", line 154, in resolve_candidates_from_lockfile return { ^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/models/repositories.py", line 605, in evaluate_candidates and not can.req.marker.evaluate(self.environment.marker_environment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/pdm/models/markers.py", line 50, in evaluate return self.inner.evaluate(environment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/dep_logic/markers/multi.py", line 139, in evaluate return all(m.evaluate(environment) for m in self.markers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/dep_logic/markers/multi.py", line 139, in return all(m.evaluate(environment) for m in self.markers) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/dep_logic/markers/single.py", line 50, in evaluate return pkg_marker.evaluate(environment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/markers.py", line 252, in evaluate return _evaluate_markers(self._markers, current_environment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/markers.py", line 158, in _evaluate_markers groups[-1].append(_eval_op(lhs_value, op, rhs_value)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/markers.py", line 116, in _eval_op return spec.contains(lhs, prereleases=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/specifiers.py", line 568, in contains normalized_item = _coerce_version(item) ^^^^^^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/specifiers.py", line 36, in _coerce_version version = Version(version) ^^^^^^^^^^^^^^^^ File "/home/diazona/.local/pipx/venvs/pdm/lib/python3.12/site-packages/packaging/version.py", line 200, in __init__ raise InvalidVersion(f"Invalid version: '{version}'") packaging.version.InvalidVersion: Invalid version: '6.7.0-gentoo' ``` It's worth noting that we haven't figured out how to reproduce this with any other package installer besides pdm. We haven't figured out why exactly that is; evidently there's something that pdm is doing and others don't which is involved in triggering the error. But, considering the simple example I posted above, this clearly doesn't _need_ pdm to be reproducible, and it didn't seem like pdm is using `Marker.evaluate()` incorrectly, which is why I'm reporting it here rather than to pdm.
sbidoul commented 4 months ago

The relevant part of PEP 508 seems to be this:

Comparisons in marker expressions are typed by the comparison operator. The operators that are not in perform the same as they do for strings in Python. The operators use the PEP 440 version comparison rules when those are defined (that is when both sides have a valid version specifier). If there is no defined PEP 440 behaviour and the operator exists in Python, then the operator falls back to the Python behaviour.

so in this case it should fall back to a string comparison?

pradyunsg commented 4 months ago

Huh, it should indeed. A PR fixing this would be welcome!

ichard26 commented 4 months ago

I took a stab in PR #816. I hope that there aren't any devious edge cases in the details, although honestly I would not be surprised :upside_down_face:

diazona commented 4 months ago

Thanks!

I think the wording of the PEP could lead to what might be considered some odd edge cases, like the example you put where 20.0 compares as less than 6.7.0-gentoo, but if that's going to be a problem, the specification would be the place to fix it.

wimglenn commented 4 months ago

The odd edge cases are also mentioned here https://github.com/astral-sh/uv/issues/3917#issuecomment-2141754917

sbidoul commented 3 months ago

In packaging<22 (pip<24.1), it seems these cases were handled with something more intuitive than a string comparison, that was not standard compliant.

Today, we are in a situation that fails explicitly in such cases which, in a way, is good, or at least better than changing behavior silently.

So if I read correctly, no packaging version has ever been standard compliant for these cases.

Since the standard will lead to a non-intuitive solution, and (pip) users did rely on the intuitive solution of packaging<22, maybe there is a case to be made to go back to the standard drawing board first, instead of merging #816. Even if that takes quite some time, affected users can stick to packaging<22 or pip<24.1.

If we implement the standard today, it will be a breaking changes for folks who relied on the previous behavior, it will likely please no-one, and it will also lead to a situation that will be much harder to change if/when we want to change the standard because it would then be another breaking change.

wimglenn commented 3 months ago

I agree with @sbidoul that the spec should be updated, and updating pip/packaging to be in line with the existing spec first is a step in the wrong direction. The string comparison fallback does not seem sensible or useful.

Is that _legacy_cmpkey equivalent to a distutils LooseVersion comparison? Perhaps it would be simple enough to specify formally in an amendment to PEP 508?

diazona commented 3 months ago

Hmm okay... if the spec is to be updated, would the goal be to formalize the behavior that pip/packaging has used in the past? Or would this be an opportunity to rethink the whole thing and come up with a potentially fresh and more sensible way of handling non-PEP 440 version numbers than lexicographic comparison?

In the latter case, I'd be willing to help advance a discussion about what the spec should say, if I knew some people with more established reputations than myself - like you all - would be interested in having it. (as per this discussion on the PR) Or, even in the former case (formalize the historical behavior) I'd be happy to help too, but I think I have little value to add in that situation.

Some ideas I've been throwing around in my head
  • Add a new environment marker that is based on `platform_release` but is guaranteed to contain a PEP 440-compatible version
  • Taking that a little further, I wonder if it'd make sense to register a notion of a marker's "type", either string or version number, and have comparison operators involving that marker work accordingly
  • If the marker value is not a PEP-440 compatible version, extract the longest prefix that is and use that for comparison (this was just an idle comment about test coverage, but if we're brainstorming new alternatives it's an option)
This is not to derail the issue with discussion of what a change proposal (if there is one) should be; just making the point that there's enough material to start that discussion in whatever venue is appropriate.
brettcannon commented 3 months ago

FYI the spec for environment markers can be found at https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers and it mirrors what's in PEP 508, but the spec is authoritative.

Since the standard will lead to a non-intuitive solution, and (pip) users did rely on the intuitive solution of packaging<22

That means it hasn't been that way since 2021, so a good amount of users have moved on.

maybe there is a case to be made to go back to the standard drawing board first

If someone wants to bring this up on discuss.python.org and try to get consensus then please feel free to! We don't have to rush to fix this, but we shouldn't leave it in a non-compliant state forever, either.

it will likely please no-one

It will at least please me as this issue can be closed. 😁

Is that _legacy_cmpkey equivalent to a distutils LooseVersion comparison? Perhaps it would be simple enough to specify formally in an amendment to PEP 508?

The problem w/ that is it assumes that even that is lax enough to parse every Linux distro version. The string fallback has the feature of being simple and it never fails for anyone. But if we try to be clever it could still lead to complaints about why doesn't my Linux distro's version get parsed? And as the project that's going to have to field those complaints, it doesn't make me want to try and change the spec to add a potential 3rd layer to the marker comparison logic.

wimglenn commented 3 months ago

That means it hasn't been that way since 2021, so a good amount of users have moved on.

I disagree on this point, pip was vendoring packaging < 22 until pretty recently, and the majority of users would be using packaging via pip as an installer, not using a packaging release directly.

Personally, I noticed because uv pip behaved differently to pip.

notatallshaw commented 3 months ago

Yeah, only pip users who have moved to pip 24.1+ have been subject to these rules.

And IMO pip users most likely to struggle with these rules are the ones using OS distributed versions of pip, on LTS versions of their OS, and may not upgrade to pip 24.1+ for several years still, e.g. I work with a team currently standardized on pip 19.1, so there may be a very long tail of users hitting this problem.

brettcannon commented 3 months ago

only pip users who have moved to pip 24.1+ have been subject to these rules.

Fair enough, my mistake as I thought pip was keeping up more.

Regardless, this project is not in the business of going against standards, so someone will have to get the spec changed for us to implement something different than what's on packaging.python.org.

notatallshaw commented 3 months ago

Fair enough, my mistake as I thought pip was keeping up more.

FYI updating to packaging 22.0+ was a difficult vendoring for pip https://github.com/pypa/pip/issues/11715 / https://github.com/pypa/pip/pull/12300

edwardpeek-crown-public commented 1 month ago

Going on a slight tangent, this issue seems to be exacerbated by the markers being evaluated eagerly. A number of packages have markers ordered like (sys_platform == "darwin" and platform_release >= "23.0") which if evaluated lazily would circumvent this issue.

Such a change could be a good intermediate solution as updating the standard could take a while.