pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Failed mirroring when using `latest_release` plugin #1775

Open imjustvisiting opened 1 month ago

imjustvisiting commented 1 month ago

Certain packages never get mirrored when using the latest_release plugin do to an unhandled exception raised by the "parse()" function from packaging.version.

nltk package is an example of a package that is never mirrored when the latest_release plugin is used to filter the last "n" versions of each package. In that specific case, the nltk package has a release version "2.0.1rc2-git" that returns the following when packaging.version.parse() is called:

Traceback (most recent call last):
    print(parse("2.0.1rc2-git"))
          ^^^^^^^^^^^^^^^^^^^^^
  File "lib64/python3.12/site-packages/packaging/version.py", line 56, in parse
    return Version(version)
           ^^^^^^^^^^^^^^^^
  File "lib64/python3.12/site-packages/packaging/version.py", line 202, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '2.0.1rc2-git'

Because this exception is not handled, the plugin always returns an empty list of possible package releases -- resulting in nothing being mirrored.

The failing code is in bandersnatch_filter_plugins/latest_name.py:

for r in releases.keys():
    versions_pair: Iterator[tuple[Version, str]] = map(
        lambda v: (parse(v), v), releases.keys()
    )
imjustvisiting commented 1 month ago

Here are a few more version patterns from various packages that fail the parse() method. Again, the unhandled parse exception results in no version of these packages being downloaded (even versions that do correctly parse) when the latest_release plugin is used.

Failed to parse geopandas version '0.1.0.dev-120828c' Failed to parse pbr version '0.5.2.5.g5b3e942' Failed to parse joblib version '0.3.2d.dev' Failed to parse pytz version '2013d' Failed to parse pysnmp version '4.1.16d' Failed to parse robotframework-requests version 'devel' Failed to parse regex version '2013-02-16'

cooperlees commented 1 month ago

Thanks for reporting. Happy to accept ignoring non PEP440 valid version and downloading all that are. As I think you've worked out wrapping that versions_pair variable setting in a try/except InvalidVersion + logging we're skipping due to being an invalid version will be a small improvement.

Then sparks the harder question of do we want to download non standard (old) non conforment versions at all? Or just document this version does cause this to not happen, since it's the "latest_release" plugin anyways ...

miketheman commented 1 month ago

Also note the existence of this package, specifically to handle the legacy version parsing. https://pypi.org/project/packaging-legacy/

allamiro commented 4 weeks ago

this issue might be related to #1784