pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Add subcmd to use metadata to roughly calculate the size of the local bandersnatch mirror #1305

Open leochen12-rgb opened 1 year ago

leochen12-rgb commented 1 year ago

At present, I can obtain the official directory size of pypi(https://pypi.org/stats/), while I am synchronizing the pypi directory. However, the du or duc command takes too long to count. Is there a more convenient way to do this?

cooperlees commented 1 year ago

Howdy,

This isn't really a bandersnatch question. This is all a limitation of lots of small files on your storage backend.

The only ideas we could possibly try:

Another hack I've generally recommended is making a dedicated partition or volume for each part of bandersnatch's storage - e.g. simple and packages directories to be in their own filesystems and then df -h can give quicker insight too.

I don't have the cycles to look into these ideas, but would take a PR add docs or a bandersnatch du like command that works out the sizes quicker if possible. But I feel we'd need to use a lower level language than python to get true speed here. Will leave open incase someone smarter comes along with better ideas.

leochen12-rgb commented 1 year ago

Thank you for your reply, and look forward to adding the du parameter to bandersmatch.

cooperlees commented 1 year ago

Awesome. Yeah I’ll be surprised if it’s much faster and will be hard to get accurate without checking if the files exist, which is the expensive part. It might surprise us and be much quicker than du …