pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Mix of plugins causing packages to not download #1190

Open sabman3 opened 2 years ago

sabman3 commented 2 years ago

More of a "how-to" than an issue. Under /web/simple, I'm getting index.html files, but no packages. Also nothing downloaded to /web/packages. I'm new to mirroring, so I may be lost, but shouldn't I be getting packages?

cooperlees commented 2 years ago

Yes, depending on your config. Please share your bandersnatch.conf here and I can look / try reproduce your behavior.

sabman3 commented 2 years ago

Here you go. Right now just trying with two modules... Thanks for the help.

[mirror] proxy = http://localhost:3128 download-mirror = https://pypi.org

; The directory where the mirror data will be stored. directory = /var/www/pypi ; Save JSON metadata into the web tree: ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME json = true

; Save package release files release-files = true

; Cleanup legacy non PEP 503 normalized named simple directories cleanup = false

; The PyPI server which will be mirrored. ; master = https://test.python.org ; scheme for PyPI server MUST be https master = https://pypi.org

; The network socket timeout to use for all connections. This is set to a ; somewhat aggressively low value: rather fail quickly temporarily and re-run ; the client soon instead of having a process hang infinitely and have TCP not ; catching up for ages. timeout = 10

; The global-timeout sets aiohttp total timeout for it's coroutines ; This is set incredibly high by default as aiohttp coroutines need to be ; equipped to handle mirroring large PyPI packages on slow connections. global-timeout = 1800

; Number of worker threads to use for parallel downloads. ; Recommendations for worker thread setting: ; - leave the default of 3 to avoid overloading the pypi master ; - official servers located in data centers could run 10 workers ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch workers = 3

; Whether to hash package indexes ; Note that package index directory hashing is incompatible with pip, and so ; this should only be used in an environment where it is behind an application ; that can translate URIs to filesystem locations. For example, with the ; following Apache RewriteRule: ; RewriteRule ^([^/])([^/])/$ /mirror/pypi/web/simple/$1/$1$2/ ; RewriteRule ^([^/])([^/])/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 ; OR ; following nginx rewrite rules: ; rewrite ^/simple/([^/])([^/])/$ /simple/$1/$1$2/ last; ; rewrite ^/simple/([^/])([^/])/([^/]+)$/ /simple/$1/$1$2/$3 last; ; Setting this to true would put the package 'abc' index in simple/a/abc. ; Recommended setting: the default of false for full pip/pypi compatibility. hash-index = false

; Whether to stop a sync quickly after an error is found or whether to continue ; syncing but not marking the sync as successful. Value should be "true" or ; "false". stop-on-error = false

; The storage backend that will be used to save data and metadata while ; mirroring packages. By default, use the filesystem backend. Other options ; currently include: 'swift' storage-backend = filesystem

; Advanced logging configuration. Uncomment and set to the location of a [root@srpplpip15328 ~]# cat /etc/bandersnatch.conf [mirror] proxy = http://localhost:3128 download-mirror = https://pypi.org

; The directory where the mirror data will be stored. directory = /var/www/pypi ; Save JSON metadata into the web tree: ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME json = true

; Save package release files release-files = true

; Cleanup legacy non PEP 503 normalized named simple directories cleanup = false

; The PyPI server which will be mirrored. ; master = https://test.python.org ; scheme for PyPI server MUST be https master = https://pypi.org

; The network socket timeout to use for all connections. This is set to a ; somewhat aggressively low value: rather fail quickly temporarily and re-run ; the client soon instead of having a process hang infinitely and have TCP not ; catching up for ages. timeout = 10

; The global-timeout sets aiohttp total timeout for it's coroutines ; This is set incredibly high by default as aiohttp coroutines need to be ; equipped to handle mirroring large PyPI packages on slow connections. global-timeout = 1800

; Number of worker threads to use for parallel downloads. ; Recommendations for worker thread setting: ; - leave the default of 3 to avoid overloading the pypi master ; - official servers located in data centers could run 10 workers ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch workers = 3

; Whether to hash package indexes ; Note that package index directory hashing is incompatible with pip, and so ; this should only be used in an environment where it is behind an application ; that can translate URIs to filesystem locations. For example, with the ; following Apache RewriteRule: ; RewriteRule ^([^/])([^/])/$ /mirror/pypi/web/simple/$1/$1$2/ ; RewriteRule ^([^/])([^/])/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 ; OR ; following nginx rewrite rules: ; rewrite ^/simple/([^/])([^/])/$ /simple/$1/$1$2/ last; ; rewrite ^/simple/([^/])([^/])/([^/]+)$/ /simple/$1/$1$2/$3 last; ; Setting this to true would put the package 'abc' index in simple/a/abc. ; Recommended setting: the default of false for full pip/pypi compatibility. hash-index = false

; Whether to stop a sync quickly after an error is found or whether to continue ; syncing but not marking the sync as successful. Value should be "true" or ; "false". stop-on-error = false

; The storage backend that will be used to save data and metadata while ; mirroring packages. By default, use the filesystem backend. Other options ; currently include: 'swift' storage-backend = filesystem

; Advanced logging configuration. Uncomment and set to the location of a ; python logging format logging config file. ; log-config = /etc/bandersnatch-log.conf

; Generate index pages with absolute urls rather than relative links. This is ; generally not necessary, but was added for the official internal PyPI mirror, ; which requires serving packages from https://files.pythonhosted.org ; root_uri = https://example.com

; Number of consumers which verify metadata verifiers = 3

; Number of prior simple index.html to store. Used as a safeguard against ; upstream changes generating blank index.html files. Prior versions are ; stored under as "versions/index.html" and the current ; index.html will be a symlink to the latest version. ; If set to 0 no prior versions are stored and index.html is the latest version. ; If unset defaults to 0. ; keep_index_versions = 0

; Configure an option to compare whether a file is identical. By default the ; "hash" method is used which reads local file content and computes hashes, ; which is slow but more reliable; when "stat" method is used, file size and ; change time are used to compare, which is useful to reduce IO workload when ; verifying a lot of files frequently. ; Possible values are: hash (default), stat compare-method = hash

; Configure to download packages from an alternative mirror. ; By default bandersnatch downloads packages from the server in the "url" ; value of json response from master server. This option asks bandersnatch ; to try to download from the configured PyPI mirror first, and fallback to ; "url" value if it was not successful (unable to get content or checksum ; mismatch). It is useful to sync most of the files from an existing, nearby ; mirror, for example when setting up a new server sitting next to an existing ; one for the purpose of load sharing. ; Downloading only from the mirror site without fallback is also possible, ; but be aware this could lead to more failures than expected and is not ; recommended for most scenarios. ; download-mirror = https://pypi-mirror.example.com/ ; download-mirror-no-fallback = False

; vim: set ft=cfg:

; Configure a file to write out the list of files downloaded during the mirror. ; This is useful for situations when mirroring to offline systems where a process ; is required to only sync new files to the upstream mirror. ; The file be be named as set in the diff-file, and overwritten unless the ; diff-append-epoch setting is set to true. If this is true, the epoch date will ; be appended to the filename (i.e. /path/to/diff-1568129735) ; diff-file = /srv/pypi/mirrored-files ; diff-append-epoch = true

[plugins] enabled = all [blocklist] platforms = windows macos py2.4 py2.5 py2.6 py3.1 py3.2 py3.3 py3.4 py3.5

[allowlist] packages = ansible elasticsearch6

sabman3 commented 2 years ago

@cooperlees Still totally lost on why no packages are being downloaded to the packages directory. Any help would be appreciated. I hope I'm not off base by expecting bandersnatch to download the actual files and not just create index pages. I know there are ways to use devpi to get packages, but not 100% sure how to do that either.

sabman3 commented 2 years ago

Update I was able to get this working by enabling specific plugins not just enabling the all. I'll need to circle back and see which plugin is causing my headaches,

cooperlees commented 2 years ago

Sorry, I've been busy. This is always going to be plugins. We do not have good interop testing. Any PRs to help this will be appreciated. There are some unittests, but there are not to test interop enabling all the plugins and ensuring they can all work together.

I really only test the allowlist in our CI today.