pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Any way we can filter by the total number of versions? #1771

Open davosMW opened 2 months ago

davosMW commented 2 months ago

Hi, I'm trying to reduce the mirror size because now the deletion takes too long. (approx 1 day with xargs + rsync)

I've found that there are 440k packages that only have 2 versions, which are quite obviously not used by anyone else but the authors themselves, and want to find a way to filter them.

Is there any way we can do this?

image

cooperlees commented 2 months ago

Hi,

I don't see a filter plugin that can do filtering based on the number of versions a project has. It should be rather easy to calculate off the metadata and add into bandersnatch. I would accept a PR for this. Please add unittests showing it working tho.

I would imagine it to be similar to size_project_metadata. Maybe call it versions_min_metadata and for bonus points support a versions_max_metadata (although I don't see a use case except maybe to filter out packages that get released daily).