pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
455 stars 141 forks source link

package json digest dict mapped to simple json hashes dict causes pip >23 to fail #1440

Closed mbeno closed 1 year ago

mbeno commented 1 year ago

blake2b_256 has been added to the digest dict in package json, pip does not know how to handle this hash name. Causing and exception when trying to install packages with pip23 or later.

If you check https://pypi.org/simple/pip against https://pypi.org/pypi/pip/23.0.1/json the digest dict does not contain the same data as the hashes dict.

$ pip install --force pip
Looking in indexes: https://<pypi mirror>/simple/
Collecting pip
  Using cached https://<pypi mirror>/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl (2.1 MB)
ERROR: Unknown hash name: blake2b_256

This is caused by how bandersnatch discovers the hashes that should be applied to /simple/ https://github.com/pypa/bandersnatch/blob/f99542f17adcc5c35ddfbb62b03a84d43122793a/src/bandersnatch/simple.py#L172-L205

cooperlees commented 1 year ago

Howdy,

First off - https://pypi.org/simple/pip/23.0.1/json does not load for me - Did you mean https://pypi.org/pypi/pip/23.0.1/json? I'm going to guess so and work off that.

Has pip moved to using the JSON Simple API as per PEP691? If so, your mirror also needs to be generating PEP691 JSON and returning it when the client (in this case pip) requests it. There is also a huge chance of bugs here as I've never ran (or do run) a bandersnatch mirror with PRP691 support. I have changed roles and no longer run a bandersnatch mirror anywhere.

Bandersnatch uses the base package JSON API for packages (e.g. for pip https://pypi.org/pypi/pip/json). We loop through the digests and add them to simple API JSON only, per PEP691. Code can be seen here: https://github.com/pypa/bandersnatch/blob/f99542f17adcc5c35ddfbb62b03a84d43122793a/src/bandersnatch/simple.py#L192-L195

Local Test

I created a venv + installed bandersnatch and used the CI config to get a small mirror locally.

python3 -m venv /tmp/tb
/tmp/tb/bin/pip install bandersnatch
mkdir /tmp/pypi
/tmp/tb/bin/bandersnatch -c src/bandersnatch/tests/ci.conf --debug mirror |& tee /tmp/bandersnatch.log

I got both HTML + JSON output

/tmp/pypi/web/simple/
/tmp/pypi/web/simple/index.v1_json
/tmp/pypi/web/simple/index.v1_html
/tmp/pypi/web/simple/index.html
/tmp/pypi/web/simple/b
/tmp/pypi/web/simple/b/black
/tmp/pypi/web/simple/b/black/index.v1_json
/tmp/pypi/web/simple/b/black/index.v1_html
/tmp/pypi/web/simple/b/black/index.html
/tmp/pypi/web/simple/b/black/versions
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.v1_json
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.v1_html
/tmp/pypi/web/simple/b/black/versions/index_17485572_2023-04-26T214110.171879Z.html
/tmp/pypi/web/simple/p
/tmp/pypi/web/simple/p/pyaib
/tmp/pypi/web/simple/p/pyaib/index.v1_json
/tmp/pypi/web/simple/p/pyaib/index.v1_html
/tmp/pypi/web/simple/p/pyaib/index.html
/tmp/pypi/web/simple/p/pyaib/versions
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.v1_json
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.v1_html
/tmp/pypi/web/simple/p/pyaib/versions/index_2328239_2023-04-26T213908.980858Z.html
/tmp/pypi/web/simple/a
/tmp/pypi/web/simple/a/acmplus
/tmp/pypi/web/simple/a/acmplus/index.v1_json
/tmp/pypi/web/simple/a/acmplus/index.v1_html
/tmp/pypi/web/simple/a/acmplus/index.html
/tmp/pypi/web/simple/a/acmplus/versions
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.v1_json
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.v1_html
/tmp/pypi/web/simple/a/acmplus/versions/index_5103287_2023-04-26T213907.943124Z.html

Which include the black2b_256 hash:

cat /tmp/pypi/web/simple/p/pyaib/index.v1_json | jq
...
    {
      "filename": "pyaib-2.1.0.tar.gz",
      "hashes": {
        "blake2b_256": "0caf0389466685844d95c6f1f857008d4931d14c7937ac8dba689639ccf0cc54",
        "md5": "5a348b49d53cee26925e7204632721b7",
        "sha256": "b6114554fb312f9b0bdeaf6a7498f7da05fc17b9250c0449ed796fac9ab663e2"
      },
      "requires-python": null,
      "url": "../../packages/0c/af/0389466685844d95c6f1f857008d4931d14c7937ac8dba689639ccf0cc54/pyaib-2.1.0.tar.gz",
      "yanked": false
    }
...

Are you running a PEP691 compatible mirror? It requires fancier nginx or what ever your using configuration to respect Content-Type header in the HTTP request.

Our reference banderx uses NGINX to do so: https://bandersnatch.readthedocs.io/en/latest/serving.html

EDIT: I have asked pip maintainers on discord for thoughts ...

mbeno commented 1 year ago

Hello, and thank you for your reply.

Yes I meant https://pypi.org/pypi/pip/23.0.1/json

This is our nginx config for the mirror

map $http_accept $mirror_suffix {
    default ".html";

    "~*application/vnd\.pypi\.simple\.v1\+json" ".v1_json";
    "~*application/vnd\.pypi\.simple\.v1\+html" ".v1_html";
    "~*text/html" ".html";
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name <name>;
    root /pypi/srv/web;
    autoindex on;
    charset utf-8;

    keepalive_timeout 70;

    location /simple/ {
        index index$mirror_suffix;

        types {
            application/vnd.pypi.simple.v1+json v1_json;
            application/vnd.pypi.simple.v1+html v1_html;
            text/html html;
        }

        try_files $uri$mirror_suffix $uri $uri/ =404;
    }

    # configure default MIME type for JSON data paths
    location /json/ {
        default_type        application/json;
    }
    location /pypi/ {
        default_type        application/json;
    }

}

From what I understand, though I am not very familiar with pip or pypi mirrors in general, in order to adhere to the PEP 691 keys in the hashes dict should be in a format that can be passed to hashlib.new(). And this is where I run into my issue using pip

From https://peps.python.org/pep-0691/

By default, any hash algorithm available via hashlib (specifically any that can be passed to hashlib.new() and do not require additional parameters) can be used as a key for the hashes dictionary. At least one secure algorithm from hashlib.algorithms_guaranteed SHOULD always be included. At the time of this PEP, sha256 specifically is recommended.

This is where pip is raising the unknown hash exception https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/hashes.py#L77-L83

mbeno commented 1 year ago

Issue resolved in https://github.com/pypa/bandersnatch/pull/1442