peterbe / hashin

Helping you write hashed entries for packages in your requirements.txt
https://www.peterbe.com/plog/hashin
MIT License
101 stars 26 forks source link

get sha256 hash from /simple (PEP503) endpoint #120

Open graingert opened 4 years ago

graingert commented 4 years ago

What's the problem this feature will solve?

when using devpi or other non- pypi.org servers the hashing falls back to downloading the asset and hashing it locally

Describe the solution you'd like

use the sha256 hash from the /simple endpoint pypi.org and devpi both provide sha256 hashes as a fragment in their href

It's optional and may not include the user' preferred hash function, so pip-compile should still fall-back on the JSON api/downloading assets:

The URL SHOULD include a hash in the form of a URL fragment with the following syntax: #=, where is the lowercase name of the hash function (such as sha256) and is the hex encoded digest. Repositories SHOULD choose a hash function from one of the ones guaranteed to be available via the hashlib module in the Python standard library (currently md5, sha1, sha224, sha256, sha384, sha512). The current recommendation is to use sha256.

for example artifactory's pypi implementation only puts md5 in the fragment of their simple href https://www.jfrog.com/jira/browse/RTFACT-18495

Alternative Solutions

https://github.com/devpi/devpi/issues/801#issuecomment-623510074

Additional context

/cc @fschulze https://github.com/jazzband/pip-tools/pull/1109 view-source on: https://m.devpi.net/root/pypi/+simple/devpi-server/ and view-source on: https://pypi.org/simple/devpi-server/

graingert commented 4 years ago

Another option that would be standardized across HTTP hosts https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Want-Digest

peterbe commented 4 years ago

Pardon my ignorance, but what does this mean simplified? At the moment hashin uses $index_url/pypi/$package_name/json and the default is index_url = https://pypi.org/ So what would it be in your suggestion?

graingert commented 4 years ago

@peterbe see the page source of https://m.devpi.net/root/pypi/+simple/devpi-server/ image

each of the urls have a #SHA256= line

peterbe commented 4 years ago

Yeah, or https://pypi.org/simple/hashin/

But that's not JSON. That would require parsing the HTML, no?

graingert commented 4 years ago

the simple index api is an html subset, designed to be amenable to simple processing:

https://github.com/pypa/pip/blob/0b5ad47cbfe986335790e728b787c580b0b3c8b1/src/pip/_vendor/distlib/locators.py#L821-L822

fschulze commented 4 years ago

Another thing that would help is if get_package_data wouldn't use an absolute path, then the index could start with path components required by devpi (/user/index/...) and devpi (or a plugin for it) could provide the json hashin needs. Currently this isn't possible, because regardless of the path in the index_url it will always go to /pypi/%s/json and loose the path from index_url.

peterbe commented 4 years ago

Another thing that would help is if get_package_data wouldn't use an absolute path, then the index could start with path components required by devpi (/user/index/...) and devpi (or a plugin for it) could provide the json hashin needs. Currently this isn't possible, because regardless of the path in the index_url it will always go to /pypi/%s/json and loose the path from index_url.

True. I think that'd need to be part of the patch that "scrapes" instead of JSON. Or, is this a worthwhile thing to have even if you're not using a index URL that requires HTML scraping?

fschulze commented 4 years ago

It is worthwhile, because then we could add the necessary json support on the devpi side and you don't have to change anything else. Scraping wouldn't be required anymore.