pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Run in cluster mode under limited network environment #1101

Open r00t1900 opened 2 years ago

r00t1900 commented 2 years ago

description

Here is a situation that using bandersnatch to sync with PYPI in one single server is too fragile and slow. For example, under some normal network environment that are not CDN-like pretty good network environment, with average download speed of 5MB/s, a full sync can take up to 30 days. Well, if get the server's load of https://pypi.org involved, the speed can come down to 2MB/s and we will take 75 days to finish an initial full sync, which is a little unacceptable. Something need to be done to avoid this.

idea

So I am thinking if we can run bandersnatch in cluster mode:

more

In a real case, our network are limited to 5MB/s max. So we are seeking for a way to break the limit and come out with the thought of cluster. Can bandersnatch get this done by configuring properly or cooperated with other software?

cooperlees commented 2 years ago

Hi there,

Maybe a simpler start point is to ensure you're using workers at the maximum 9 in the bandersnatch.conf? If so, that's still capping you at the 5MB/s?

workers = 3

That said, another simple approach I suggest we could possible start with is:

Once they are all doing partial upgrades you could run a central full sync to generate the main index.html. Open to other ideas here too. Feel free to share.

r00t1900 commented 2 years ago

Here is my network maximum is 5MB/s, no matter how many worker I set on the same local instance. But I can have 5MB/s in company and another 5MB/s at home. I would like to make the use of both of them.

r00t1900 commented 2 years ago

add a generate_global_index bool in configI can not find this in documentation, where should I add to?  I add it to the [mirror] section but nothing work. I really need the only download but not performing "generating global index page" because error happens and this took too many times.