pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

'prerelease_release' plugin causes a KeyError in package metadata verify #944

Open RWoodring79 opened 3 years ago

RWoodring79 commented 3 years ago

I am unable to perform a "verify" on my package mirror if I have the prerelease_release plugin enabled in my bandersnatch.conf file. If I remove this plugin, it works fine.

(venv) root@mirror:/mnt/pypi# bandersnatch --config ./bandersnatch.conf verify
2021-06-11 08:18:55,619 INFO: Starting verify for /mnt/pypi/pypi with 3 workers (verify.py:242)
2021-06-11 08:18:55,792 INFO: Parsing wix-protos-wixerd-api-gateway-server-api (verify.py:128)
2021-06-11 08:18:55,885 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
  File "/mnt/pypi/venv/bin/bandersnatch", line 8, in <module>
    sys.exit(main())
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 213, in main
    return asyncio.run(async_main(args, config))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 127, in async_main
    return await bandersnatch.verify.metadata_verify(config, args)
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
    await verify_producer(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 222, in verify_producer
    await asyncio.gather(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 212, in consume
    await verify(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 155, in verify
    plugin.filter(pkg)
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch_filter_plugins/prerelease_name.py", line 39, in filter
    version = metadata["version"]
KeyError: 'version'

This is an except from my config file. If I comment out just the prerelease_release line, the verify operation starts as expected.

[plugins]
enabled =
    blocklist_project
    prerelease_release
    regex_project

This issue may be similar to https://github.com/pypa/bandersnatch/issues/505 as they are both failing because the metadata dictionary does not contain the key they are expecting.

cooperlees commented 3 years ago

Thanks for reporting. I guess we just have to check if it exists. I would have expected it to always. Is there a specific package you hit or random that cause they KeyError?

In any case, I guess adding the check and then deciding if we skip that version / part of the metadata we're in needs to be identified and correct behavior we deem to be performed (either ignore and move on or error bad metadata etc.)

RootLUG commented 3 years ago

I can confirm this bug is present and can be reproduced via the docker image, here is the log from my mirror:

2021-06-28 09:59:18,311 INFO: Starting verify for /data/mirror with 3 workers (verify.py:242)
2021-06-28 09:59:26,091 INFO: Parsing 0 (verify.py:128)
2021-06-28 09:59:26,213 INFO: Initialized latest releases plugin with keep=3 (latest_name.py:34)
2021-06-28 09:59:26,220 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
  File "/usr/local/bin/bandersnatch", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 219, in main
    return asyncio.run(async_main(args, config))
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 157, in async_main
    return await bandersnatch.verify.metadata_verify(config, args)
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
    await verify_producer(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 222, in verify_producer
    await asyncio.gather(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 212, in consume
    await verify(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 155, in verify
    plugin.filter(pkg)
  File "/usr/local/lib/python3.9/site-packages/bandersnatch_filter_plugins/latest_name.py", line 43, in filter
    version: str = metadata["version"]
KeyError: 'version'

It appears the package called "0" is causing the problems in my case, it's the same package name each time I try to re-run the verify so it does not appear to be random

RWoodring79 commented 3 years ago

Ok, I finally have a working development environment set up. As someone who has never done python packages before, a DEVELOPERS.md file with some tips like "you have to run pip install --editable . after checking out the code" would have been really helpful. I know, you would be happy to accept a pull request... :)

It appears the PreReleaseFilter.filter is looking for a "version" entry in the metadata dictionary created from the web/json/{pkg name}.json file inthe wrong place. The filter is trying to read metadata["version"], but in my testing the version number is one layer deeper at metadata["info"]["version"]. Is it possible the format of this json file changed since the plugin was developed?

cooperlees commented 3 years ago

I doubt it has changed, but the ["version"] is a legacy field that must not be always filled. I would say we should:

Or some kind of fallback logic like that for the plugin.

As for development instructions - We should update https://bandersnatch.readthedocs.io/en/latest/CONTRIBUTING.html#development-venv Thanks for noticing.

RWoodring79 commented 3 years ago

After some more digging, my first theory about the version entry moving was wrong. Making the change I initially proposed will break the mirror functionality. When performing a mirror, process_package calls package.filter_all_releases(self.filters.filter_release_plugins()) but when performing a verify, the call to plugin.filter(pkg) does not include the package.filter_all_releases(...). That is important because filter_all_releases creates a new dictionary for each entry in the "releases" lists, and that dictionary (not the one read directly from JSON file) is the one with the "version" entry.