package-url / purl-spec

A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby
https://github.com/package-url/purl-spec
Other
680 stars 157 forks source link

Standard hash algorithm names? #246

Open matt-phylum opened 1 year ago

matt-phylum commented 1 year ago

Everybody knows about the MD5 SHA-1 SHA-256 SHA-512 algorithms, and these are almost universally written as md5 sha1 sha256 sha512. Unfortunately, the PURL spec says the key for the checksum qualifier is "lowercase_algorithm" and then lowercases "SHA-1" and "SHA-256" into "sha1" and "sha256" for the example. For these algorithms, the description and example are probably okay. That implies that the rest of the SHA-2 family, SHA-224 SHA-384 SHA-512, becomes sha224 sha384 sha512.

However, it's unclear what the key should be for other algorithms like SHA3-256. It's probably sha3-256? sha3256 definitely doesn't seem right.

I've also noticed that some algorithms that shouldn't be ambiguous are made ambiguous by library implementations. For example, Python uses the ID shake_256 for the SHAKE256 algorithm and the ID ripemd160 for the RIPEMD-160 algorithm.

It would be helpful to have a list of standard hash algorithms with their correct keys.

pombredanne commented 7 months ago

@matt-phylum what would be your recommendation?

matt-phylum commented 7 months ago

blake3 and shake256 are variable length output algorithms.

Uncommon but similar algorithms like blake2b-512 or blake2s-256 may not need to be mentioned.

This is probably another topic, but NuGet packages and Go modules have their own specialized hashes and those should probably be documented or warned about somehow. For NuGet packages, sometimes the sha512 hash includes the attached signature file and sometimes it does not (the package identity must not change when the registry adds its own signature to this file). For Go, packages are conceptually distributed as directory hierarchies instead of flat files, so they have something called an h1 hash, which is an sha1 Merkle tree computed over the package content.

matt-phylum commented 6 months ago

Related: #277