Closed wayneworkman closed 5 years ago
Nope, not today. But I'd accept the PR.
tl;dr from memory way I'd approach it:
I also think devpi fits your needs better potentially too: https://devpi.net/docs/devpi/devpi/stable/%2Bd/index.html
Seems like once file filtering plugins are added we could just implement this as a plugin.
I have been tasked with setting up a PyPI mirror on a standalone network but space is somewhat limited. Is the filter plugin idea feasible yet?
It does not exist, but you're welcome to implement it.
I'd look at just whitelisting the packages you need OR blacklist the top 100 space users documented here: https://pypi.org/stats/
I even have a small tool to generate a bandersnatch blacklist: https://github.com/cooperlees/pypistats
What does not exist, the filtering infrastructure or this specific filter?
If I understand the numbers correctly, the entire PyPI mirror would be 3.1 TB but chopping off the top 100 would cut it down to about 1 TB. I started down this path, but I do not have enough storage space for that on my internet facing machine. The simple answer is to add more storage, but sometimes learning a new technology is faster than navigating the corporate bureaucracy for IT purchasing.
The logic to only keep the latest version on your mirror does not exist.
You will potentially run into problems when packages hard pin version of their deps only maintaining the latest version.
I'd prefer maybe X latest versions and it be configurable. Using that and whitelist should allow people to get pretty lean and mean mirrors.
That sounds amazing to me.
On Fri, Jan 25, 2019, 4:40 PM Cooper Lees <notifications@github.com wrote:
The logic to only keep the latest version on your mirror does not exist.
You will potentially run into problems when packages hard pin version of their deps only maintaining the latest version.
I'd prefer maybe X latest versions and it be configurable. Using that and whitelist should allow people to get pretty lean and mean mirrors.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pypa/bandersnatch/issues/49#issuecomment-457758058, or mute the thread https://github.com/notifications/unsubscribe-auth/AMsq73TpulbsFG-_uO_JbT02V2RHayY3ks5vG4fmgaJpZM4VNOnn .
I have been looking into this and have some prototype code hacked into package.py:_filter_release(). It needs more testing before I attempt to make a filter (and a better understanding of how the filters work), but it seems feasible. @cooperlees Do you use and IDE (like PyCharm?) for debugging, or are you just much better at Python than me? I have something set up and working now, but it is clunky at best, debugging code under site-packages in my venv.
I use atom and pdb when I want to really debug. Just write unit tests to ensure your code is doing what you're hoping it does.
Also, just put up the PR and I can comment on that :)
When I look at the dictionary of releases (package.py: self.releases), they appear to be sorted alphabetically. For example, my debug log message states:
Package AGEpy contains 24 versions: ['0.1.3', '0.1.4', '0.1.5', '0.1b0', '0.1b1', '0.2.0', '0.2.1', '0.2.2', '0.2.3', '0.2.4', '0.2.5', '0.3.0', '0.3.1', '0.3.2', '0.5.0', '0.6.0', '0.6.1', '0.6.2', '0.6.3', '0.6.4', '0.6.5', '0.6.7', '0.7.0', '0.8.0']
Is alphabetical my best option to choosing the latest?
Should I use the 'upload_time' entry in the values()?
Other better option?
I’d convert all the str in that list to a ‘Version’ object from the ‘packaging’ module. Then you can just pop X versions off the latest version from the list to download and delete the rest that exist on disk. This ensures you are PEP compliant.
Documentation can be found here: https://packaging.pypa.io/en/latest/version/
Hello,
I have added the possibility to keep only N latest versions. More precisely, it keeps the N greatest versions, according to packaging.version.Version
order.
I have also added a plugin to filter out unwanted platform-specific binaries, in order to save disk space and bandwidth.
I'm not ready to propose a PR yet... I have to write some tests before.
https://github.com/rene-d/bandersnatch/tree/dev
Regards
This will go out in 3.3.0 today.
Is there any way to configure bandersnatch to only download the latest version of packages? I don't need nor want 600GB+ of every released version of every package. I'm not running an official mirror, I just want a local copy of latest packages.