Open 89ao opened 1 year ago
Bandersnatch does not support delete during the mirror
. There is not enough metadata to know what blobs to delete. That said, I have not dug into yanking, we might have enough metadata for those - might be worth looking into.
We only have bandersnatch verify --delete
as it has to walk the file system and workout what files on the file system are not part of any JSON metadata anymore ...
Without adding more metadata to PyPI we can't make this more efficient.
@cooperlees thanks cooper, problem is that may someday risk packages may appear online.After official delete it , I'd like to stay consistent. please consider adding this feature ,tks!
This is not an easy fix. As I said, ideally we'd need to put more metadata into Warehouse (pypi.org). If you have cycles, opening an issue on warehouse (if we don't have one) asking for better metadata to allow mirroring to delete packages would be a good start.
mirror
if there are any packages that need to be removed ... we just copy pypi.orgs layout today as it matches metadata.bandersnatch will correctly generate correct Simple API HTML + JSON, so the package manager (e.g. pip) won't know the deleted/yanked version exists. The artifacts/blobs are just sitting there wasting disk space. A verify running in the background could slowly reclaim space. Walking filesystems is slow tho, I get that :(
Thanks to you @cooperlees ,It's not only the disk space's issue , It'seems that once in a while the official will delete some risk packages just like "rest-framework" and "apicolors" as I said.We also don't want them can still be downloaded. May "bandersnatch verify --deleted" deleted the outdated packages automately? If not we may need to write some shell to manually do this.
https://bandersnatch.readthedocs.io/en/latest/#bandersnatch-verify
Yes, running a verify
with --delete
will keep track and delete packages. It's not smart or incremental and needs to walk project by project to do so. All enhancements welcome.
pip
pointed at your mirror will not consider using those versions.I would love to know how you imagine doing this via shell? It should be no easier than just enhancing bandersnatch's logic.
Maybe obtain a official package list and compare it to local list ? If one package is not exist ,delete it locally?
just as a infomation-sync, this situation happens again as below: https://medium.com/checkmarx-security/py-torch-a-leading-ml-framework-was-poisoned-with-malicious-dependency-e30f88242964
https://pypi.org/project/torchtriton/ has already deleted torchtriton,but it just did't not delete it automaticlly while using bandersnatch. So we deleted it manually, looking forward to some more update ,tks!
@cooperlees hello cooperlees, recently I write a small tool to compare local project list and official project list () and now I've found a bunch of projects exist locally but no longer exist official any more for example:
...
a-plus-b
a-simple-modu
a1g0py8128
aaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaa-lama-ze-lo-oved
aabs7calc
aaron
aashika-calculator
abaxador-de-arquivo
abc-reader
abc0123
abcmikivideo
abenity
abhishekwebcodett
abhishekwebcodett2
abhishekwebcodett3
abilityrequest
...
Question is that when I plan to delete them manually , I just can't make it done,take project "aaaaaaaaaaa" for example :
[root@VM_21_104_centos /data/home/motorao/bandersnatch]# ls -al /yum/pip/web/simple/aaaaaaaaaaa/
total 18580
drwxr-xr-x 2 root root 4096 Jun 7 2022 .
drwxr-xr-x 1 root root 19009536 Jan 5 22:48 ..
-rw-r--r-- 1 root root 452 Jun 7 2022 index.html
[root@VM_21_104_centos /data/home/motorao/bandersnatch]# cat /yum/pip/web/simple/aaaaaaaaaaa/index.html
<!DOCTYPE html>
<html>
<head>
<title>Links for aaaaaaaaaaa</title>
</head>
<body>
<h1>Links for aaaaaaaaaaa</h1>
<a href="../../packages/6d/c1/2d60ee949b1be5382703260b0bdd4345e2711abdddc2b9e2bbb46f788ac1/aaaaaaaaaaa-1.1.1-py2.py3-none-any.whl#sha256=05ff699e6eb769bdcc489f4390a51d1056332e8d16bb0bd0ef5f15709341b88f" data-requires-python=">=2">aaaaaaaaaaa-1.1.1-py2.py3-none-any.whl</a><br/>
</body>
</html>
<!--SERIAL 14055105-->[root@VM_21_104_centos /data/home/motorao/bandersnatch]# bandersnatch delete aaaaaaaaaaa
2023-01-05 22:50:00,019 ERROR: Unable to load entry point swift_plugin = bandersnatch_storage_plugins.swift:SwiftStorage: No module named 'keystoneauth1'
2023-01-05 22:50:00,020 ERROR: /yum/pip/web/json/aaaaaaaaaaa does not exist. Pulling from PyPI
2023-01-05 22:50:00,021 INFO: Fetching https://pypi.python.org/pypi/aaaaaaaaaaa/json
2023-01-05 22:50:00,399 ERROR: /yum/pip/web/json/aaaaaaaaaaa.new does not exist - Did not get new JSON metadata
2023-01-05 22:50:00,399 ERROR: Unable to HTTP get JSON for /yum/pip/web/json/aaaaaaaaaaa
could you help me explain why does it happens?
So I don't have any plans to work on this. To do this correctly we need to store packages differently, change PyPI metadata or add another API to PyPI to let us know what to delete.
In the logs I see /yum/pip/web/json/aaaaaaaaaaa
- Seems it's not adding /data/home/motorao/bandersnatch
to the path? I haven't read the code but we must have a bug there.
If that's not the issue, then it's the fact the the package is deleted, and so it the JSON metadata, so we need to use local metadata only. If that's somehow been deleted too we're out of luck and need to manually delete.
--no-json-update
to try not to pull from pypi.org
Fix PR with unittest covering bug/new behavior welcome!
yes it is indeed. I'll learn and try how to make Fix PR later. tks a lot !@cooperlees
Should just need a boolean around the code that calls pipit.org to pull the JSON in verify.py - I haven't read the code tho, and I have a terrible memory :)
Could you tell me how to remove official removed packages automatically?
for example : https://pypi.org/project/apicolors/
the apicolors are deleted by pypi.org 4 days ago(Nov 9), but after my bandersnatch server synced it locally,It exist till now (Nov 11).(but my sync interval is 30min)
here is the bander.log:
And here is the bandersnatch.conf and I'am using bandersnatch-6.0.0 on docker-compose.