php / doc-en

English PHP documentation
502 stars 732 forks source link

hash_algos() docs should clarify which algos are cryptographic #3616

Open cmb69 opened 3 months ago

cmb69 commented 3 months ago

Triggered by https://news-web.php.net/php.internals/124613. Thanks, @IMSoP!

hash_hmac() has a respective changelog entry:

https://github.com/php/doc-en/blob/feab22a6798fbb9137f9bbdb2b94ae0182cb950e/reference/hash/functions/hash-hmac.xml#L100

I think it's a good idea to also state that in the hash_algos() docs.

cmb69 commented 3 months ago

Maybe it is sufficient to clarify that hash_hmac_algos() lists these.

IMSoP commented 3 months ago

This would only solve a tiny portion of the problem I was pointing out.

What's really needed is:

claudepache commented 3 months ago

hash_algos() docs should clarify which algos are cryptographic

I’m not sure it’s actually useful information; at least, it’s largely insufficient. For instance, md4 is “cryptographic“, but you shouldn’t use it for anything cryptography-related unless someone holds a gun to your head.

jimwins commented 3 months ago

A common theme in the user-contributed notes for hash() was performance benchmarks, so it's probably worth adding some discussion of that (including why you may not even want the fastest algo). Also, if we're going to have a table of algo information in the documentation, the expected/maximum output size of each would be a good data point to add.

hash() and hash_hmac() should definitely have a common paragraph about their possible use in password situations with reference to password_hash().

(I deleted a bunch of the notes on hash(), there were quite a few that were just benchmarks from 5-10 years ago.)

damianwadley commented 3 months ago

I just want to be sure of something here: is the goal of this documentation to talk about the PHP functions and how they work, or is the goal to teach developers about how to implement their own version of cryptography?

IMSoP commented 3 months ago

@damianwadley What do you mean by "implement their own version"? I don't think anyone's expecting users to come up with new, novel, hashing algorithms.

What I am hoping for is some description beyond a name for the 60 different algorithms currently supported by hash(), with some explanation of why a user might want to use them, or why they should avoid them.

cmb69 commented 3 months ago

While agree that the current documentation is somewhat insufficient, I wouldn't go too much into the details; perhaps we can find some good article(s) to link to, instead.

  • An explanation of different hashing use cases, and terms like "cryptographic hash"

A short explantion might be in order, but certainly not a thorough treatment like on https://en.wikipedia.org/wiki/Hash_function or https://en.wikipedia.org/wiki/Cryptographic_hash_function.

  • An explanation of when to use hash(), hash_hmac(), or password_hash()

ACK

  • A list or table with the available algorithms, giving more than just their names

Hmm, maybe some rough categorization might be in order, but detailed explanation about every single algorithm seems out of scope of the PHP manual. Besides, it's already not easy to keep the simple list up to date.

  • Guidance on which algorithms to avoid (here's where you can talk about the weaknesses of MD5 and SHA1!)

That's difficult. Depending on the use case, MD5 and SHA1 might still be fine (and sometimes just necessary for interoperability with already existing hashes). See https://en.wikipedia.org/wiki/Cryptographic_hash_function#Properties for details.

  • Some kind of recommendation of what algorithm users should pick for common use cases, if they're not constrained by compatibility

That's difficult, again. Maybe we could attempt some rough categorization of the available algorithms.

A common theme in the user-contributed notes for hash() was performance benchmarks, so it's probably worth adding some discussion of that (including why you may not even want the fastest algo).

A rough explanation of the performance might make sense, but these benchmarks are pretty useless, in my opinion. After all, some of the algorithms may be implemented with SIMD instructions (but having a fallback if these instructions are not available), and a few even might have hardware support (e.g. https://github.com/php/php-src/pull/4108), and the implementations may even change over time.

IMSoP commented 3 months ago

Hmm, maybe some rough categorization might be in order, but detailed explanation about every single algorithm seems out of scope of the PHP manual. Besides, it's already not easy to keep the simple list up to date.

I didn't say "detailed explanation", I said "some description beyond a name". The context being that multiple people are claiming that users should be using the hash() function, and choosing the right algorithm; and they don't seem keen on simply adding a function for sha256(), or whatever the "best" algorithm is. So I'm assuming there is more to say about the strengths and weaknesses of different algorithms, in which case we need to present that to users.

Maybe there are some algorithms that can just be labelled "rarely used, included for compatibility with other systems", but right now we don't even have that.

cmb69 commented 3 months ago

I'm not an expert on hash functions, so take the following with a huge grain of salt (and please correct me, if I'm wrong). As I see it, there are roughly three categories of hash functions:

So "usually" this boils down to:

jimwins commented 3 months ago

@cmb69, I thought that was a good starting point for beefing up the introduction to the documentation for the hash extension! PR is just a draft, feel free to suggest changes and additions and maybe we can address some of the other areas that @IMSoP identified.

cmb69 commented 3 months ago

Quick note to not forget about it: maybe link to https://csrc.nist.gov/projects/hash-functions (see https://news-web.php.net/php.internals/124678).