pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
435 stars 137 forks source link

Enhance `bandersnatch mirror` to optionally delete packages detected as no longer found #1686

Open 89ao opened 3 months ago

89ao commented 3 months ago

Taking the package tohoku-tus-iot-automation as an example, I saw from the logs that this package was synced down from the official source on March 6th. By March 7th, bandersnatch had detected that the upstream had already removed it (due to the package containing malicious information collection backdoors and trojans). However, our local bandersnatch had not yet deleted it. On March 18th, during troubleshooting by our operations team, they discovered this issue and manually executed "bandersnacth delete tohoku-tus-iot-automation" to remove it.

2024-03-06 20:24:02,571 bandersnatch.package: INFO Fetching metadata for package: tohoku-tus-iot-automation (serial 22195024)
2024-03-06 20:24:02,932 bandersnatch.mirror: INFO Storing index page(s): tohoku-tus-iot-automation - in /repo/web/simple/tohoku-tus-iot-automation
2024-03-06 21:28:19,422 bandersnatch.package: INFO Fetching metadata for package: tohoku-tus-iot-automation (serial 22196068)
2024-03-06 21:28:20,140 bandersnatch.mirror: INFO Storing index page(s): tohoku-tus-iot-automation - in /repo/web/simple/tohoku-tus-iot-automation
2024-03-06 21:49:18,704 bandersnatch.package: INFO Fetching metadata for package: tohoku-tus-iot-automation (serial 22196135)
2024-03-06 21:49:19,244 bandersnatch.mirror: INFO Storing index page(s): tohoku-tus-iot-automation - in /repo/web/simple/tohoku-tus-iot-automation
2024-03-06 22:10:16,922 bandersnatch.package: INFO Fetching metadata for package: tohoku-tus-iot-automation (serial 22196395)
2024-03-06 22:10:17,324 bandersnatch.mirror: INFO Storing index page(s): tohoku-tus-iot-automation - in /repo/web/simple/tohoku-tus-iot-automation
2024-03-07 00:18:20,878 bandersnatch.package: INFO Fetching metadata for package: tohoku-tus-iot-automation (serial 22198726)
2024-03-07 00:18:21,125 bandersnatch.package: INFO tohoku-tus-iot-automation no longer exists on PyPI
2024-03-18 16:23:14,613 bandersnatch: INFO Deleting path: /repo/web/json/tohoku-tus-iot-automation
2024-03-18 16:23:14,614 bandersnatch: INFO Removing file: /repo/web/json/tohoku-tus-iot-automation
2024-03-18 16:23:14,614 bandersnatch: INFO Deleting path: /repo/web/pypi/tohoku-tus-iot-automation
2024-03-18 16:23:14,614 bandersnatch: INFO Forcing removal of files under /repo/web/pypi/tohoku-tus-iot-automation

My question is, since Bandersnatch can detect that the upstream has removed https://github.com/pypa/bandersnatch/blob/main/src/bandersnatch/mirror.py#L125, why wasn't there consideration given to adding the ability for automatic deletion (or a switch)? Are there any other considerations or scenarios that prevent us from doing so?

image
cooperlees commented 3 months ago

This is a good question about deletion here since bandersnatch detects it and nice proposed addition.

I would accept a new config parameter driven deletion there (maybe delete_missing_packages) that defaults to false in default.conf and then uses the metadata to deletes all of the package blobs and simple API files.

Thanks!

89ao commented 3 months ago

Thanks a lot @cooperlees ! Looking forward to seeing the feature implemented as soon as possible.