Open leochen12-rgb opened 1 year ago
Howdy,
This isn't really a bandersnatch question. This is all a limitation of lots of small files on your storage backend.
The only ideas we could possibly try:
du
(but not sure all the operations du does under the covers)Another hack I've generally recommended is making a dedicated partition or volume for each part of bandersnatch's storage - e.g. simple and packages directories to be in their own filesystems and then df -h
can give quicker insight too.
hash-index = true
you could also create a volume/file system per shard to get further insightI don't have the cycles to look into these ideas, but would take a PR add docs or a bandersnatch du
like command that works out the sizes quicker if possible. But I feel we'd need to use a lower level language than python to get true speed here. Will leave open incase someone smarter comes along with better ideas.
Thank you for your reply, and look forward to adding the du parameter to bandersmatch.
Awesome. Yeah I’ll be surprised if it’s much faster and will be hard to get accurate without checking if the files exist, which is the expensive part. It might surprise us and be much quicker than du …
At present, I can obtain the official directory size of pypi(https://pypi.org/stats/), while I am synchronizing the pypi directory. However, the du or duc command takes too long to count. Is there a more convenient way to do this?