Closed mbeno closed 1 year ago
Howdy,
First off - https://pypi.org/simple/pip/23.0.1/json does not load for me - Did you mean https://pypi.org/pypi/pip/23.0.1/json? I'm going to guess so and work off that.
Has pip
moved to using the JSON Simple API as per PEP691? If so, your mirror also needs to be generating PEP691 JSON and returning it when the client (in this case pip) requests it. There is also a huge chance of bugs here as I've never ran (or do run) a bandersnatch mirror with PRP691 support. I have changed roles and no longer run a bandersnatch mirror anywhere.
Bandersnatch uses the base package JSON API for packages (e.g. for pip https://pypi.org/pypi/pip/json). We loop through the digests and add them to simple API JSON only, per PEP691. Code can be seen here: https://github.com/pypa/bandersnatch/blob/f99542f17adcc5c35ddfbb62b03a84d43122793a/src/bandersnatch/simple.py#L192-L195
I created a venv + installed bandersnatch and used the CI config to get a small mirror locally.
python3 -m venv /tmp/tb
/tmp/tb/bin/pip install bandersnatch
mkdir /tmp/pypi
/tmp/tb/bin/bandersnatch -c src/bandersnatch/tests/ci.conf --debug mirror |& tee /tmp/bandersnatch.log
I got both HTML + JSON output
/tmp/pypi/web/simple/
/tmp/pypi/web/simple/index.v1_json
/tmp/pypi/web/simple/index.v1_html
/tmp/pypi/web/simple/index.html
/tmp/pypi/web/simple/b
/tmp/pypi/web/simple/b/black
/tmp/pypi/web/simple/b/black/index.v1_json
/tmp/pypi/web/simple/b/black/index.v1_html
/tmp/pypi/web/simple/b/black/index.html
/tmp/pypi/web/simple/b/black/versions
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.v1_json
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.v1_html
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.html
/tmp/pypi/web/simple/p
/tmp/pypi/web/simple/p/pyaib
/tmp/pypi/web/simple/p/pyaib/index.v1_json
/tmp/pypi/web/simple/p/pyaib/index.v1_html
/tmp/pypi/web/simple/p/pyaib/index.html
/tmp/pypi/web/simple/p/pyaib/versions
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.v1_json
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.v1_html
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.html
/tmp/pypi/web/simple/a
/tmp/pypi/web/simple/a/acmplus
/tmp/pypi/web/simple/a/acmplus/index.v1_json
/tmp/pypi/web/simple/a/acmplus/index.v1_html
/tmp/pypi/web/simple/a/acmplus/index.html
/tmp/pypi/web/simple/a/acmplus/versions
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.v1_json
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.v1_html
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.html
Which include the black2b_256
hash:
cat /tmp/pypi/web/simple/p/pyaib/index.v1_json | jq
...
{
"filename": "pyaib-2.1.0.tar.gz",
"hashes": {
"blake2b_256": "0caf0389466685844d95c6f1f857008d4931d14c7937ac8dba689639ccf0cc54",
"md5": "5a348b49d53cee26925e7204632721b7",
"sha256": "b6114554fb312f9b0bdeaf6a7498f7da05fc17b9250c0449ed796fac9ab663e2"
},
"requires-python": null,
"url": "../../packages/0c/af/0389466685844d95c6f1f857008d4931d14c7937ac8dba689639ccf0cc54/pyaib-2.1.0.tar.gz",
"yanked": false
}
...
Are you running a PEP691 compatible mirror? It requires fancier nginx or what ever your using configuration to respect Content-Type
header in the HTTP request.
Our reference banderx uses NGINX to do so: https://bandersnatch.readthedocs.io/en/latest/serving.html
EDIT: I have asked pip maintainers on discord for thoughts ...
Hello, and thank you for your reply.
Yes I meant https://pypi.org/pypi/pip/23.0.1/json
This is our nginx config for the mirror
map $http_accept $mirror_suffix {
default ".html";
"~*application/vnd\.pypi\.simple\.v1\+json" ".v1_json";
"~*application/vnd\.pypi\.simple\.v1\+html" ".v1_html";
"~*text/html" ".html";
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name <name>;
root /pypi/srv/web;
autoindex on;
charset utf-8;
keepalive_timeout 70;
location /simple/ {
index index$mirror_suffix;
types {
application/vnd.pypi.simple.v1+json v1_json;
application/vnd.pypi.simple.v1+html v1_html;
text/html html;
}
try_files $uri$mirror_suffix $uri $uri/ =404;
}
# configure default MIME type for JSON data paths
location /json/ {
default_type application/json;
}
location /pypi/ {
default_type application/json;
}
}
From what I understand, though I am not very familiar with pip or pypi mirrors in general, in order to adhere to the PEP 691 keys in the hashes dict should be in a format that can be passed to hashlib.new(). And this is where I run into my issue using pip
From https://peps.python.org/pep-0691/
By default, any hash algorithm available via hashlib (specifically any that can be passed to hashlib.new() and do not require additional parameters) can be used as a key for the hashes dictionary. At least one secure algorithm from hashlib.algorithms_guaranteed SHOULD always be included. At the time of this PEP, sha256 specifically is recommended.
This is where pip is raising the unknown hash exception https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/hashes.py#L77-L83
Issue resolved in https://github.com/pypa/bandersnatch/pull/1442
blake2b_256 has been added to the digest dict in package json, pip does not know how to handle this hash name. Causing and exception when trying to install packages with pip23 or later.
If you check https://pypi.org/simple/pip against https://pypi.org/pypi/pip/23.0.1/json the digest dict does not contain the same data as the hashes dict.
This is caused by how bandersnatch discovers the hashes that should be applied to /simple/ https://github.com/pypa/bandersnatch/blob/f99542f17adcc5c35ddfbb62b03a84d43122793a/src/bandersnatch/simple.py#L172-L205