Closed funny-falcon closed 8 years ago
Sounds like it would only take a two-line change to test the perf impact of this change. @Gankro, did you have some hashtable benchmark scripts lying around?
That'd be interesting. Sub.
You can fork http://cglab.ca/~abeinges/blah/hash-rs/ if you want to do some ok-ish comparisons.
@bluss is my goto expert on siphash.
Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.
Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically? I'm curious what people have learned about SipHash in general now that it's been out for several years.
If we weaken it so much that it doesn't prevent hash flooding anymore, then we lose the reason to use SipHash and could use a faster hash function.
I know I tried long ago and noted that SipHash-1-2 didn't even pass the Smhasher testsuite (which is probably bad). (SipHash-N-M is for N hash rounds per 8 bytes of input and M fixed rounds as finalization).
Siphash-1-3 passes Smhasher. It is smallest configuration, which has good avalanche.
Siphash is protected from "seed independent collisions" with any configuration with good avalanche.
Siphash24 is recomended for MAC cause when you do MAC you put result in public, so there are other vectors of attack: plain text, chosen plain text, differential, etc. As cryptographers, Siphash authors simply recommend more rounds than minimal required to protect against future investigations.
Within hash-table result of hash-function is not published, so there is no way to recover seed. Then protection from "seed independent collisions" is just enough.
Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.
Agree. There should be separate SipHash13 for internal/hash-table usage, and SipHash (24) remains for direct/external usage.
Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically?
I asked him year ago about using SipHash13 for hash tables. Answer is in issue text.
I think, we can ask him again.
If @veorq feels its fine then it should be fine. The worst case is there's a hashDoS against Siphash13, in which case you can upgrade to a stronger hash function.
@Gankro, how likely do you think this is to break code in the wild? I was under the impression that we considered the precise underlying default hash function to be an implementation detail (with the caveat that you can count on it to be cryptographically secure).
@bstrie from a security perspective, I'd say you definitely should not couple to the hash function in any way. Java made that mistake (the hashing algorithm was included in the Java Language Specification) and that made it so they were not able to adequately respond to hashDoS.
@tarcieri That's actually tied in with what I'm implying here. Supposing that we did switch to 1-3, and that an attack on 1-3 was later discovered, we'd want to reserve the right to bump it back up to 2-4 (or, you know, whatever the state of the art is by then) without risking massive breakage in the ecosystem. If changing the default hash function should prove to be disruptive at the current point in time then that's a separate problem that we need to consider, hence why I'm curious as to the true magnitude of such a change.
Furthermore, I get the impression that the breakage that Gankro's envisioning wouldn't be detectable by Crater.
Using 1-3 doesn't fix the fundamental slowness of our hashing, but it will instead speed up the long data case. The short data case remains a problem, due to (I think):
Of course we should still go for it, if possible, but I don't think it's a substitute for allowing faster alternatives to be used (application may choose to sacrifice hash flooding defence, deciding it's not useful for them).
@bstrie it seems when Java was last affected by hashDoS they ultimately ended up making a breaking change to the language specification:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html
Did anybody experiment adding padding to 8 bytes in SipHash?
In particular, if we padded whatever is input to 8 bytes, it would be possible to remove the ntail
handling code here https://github.com/rust-lang/rust/blob/e3cd87241898cd88e4348e9c6142a03d8909c4e0/src/libcore/hash/sip.rs#L151-L167
and https://github.com/rust-lang/rust/blob/e3cd87241898cd88e4348e9c6142a03d8909c4e0/src/libcore/hash/sip.rs#L185-L186
SipHash designer here, haven't changed my opinion about SipHash-1-3 :-)
A warning, though: SipHash-1-3 leaves 4 rounds between the last attacker-controlled input and the output. There's a "distinguisher" on 4 rounds in https://eprint.iacr.org/2014/722.pdf, or in simplest terms a statistical bias that shows up given a specific difference pattern in the input of the 4-round sequence. But you can't inject that pattern in SipHash-1-3 because you don't control all the state. And even if you could inject that pattern the bias wouldn't be exploitable anyway.
@bluss yeah, sooner of later that will need to be addressed as well.
@veorq, thanks for your input!
@bluss, indeed I don't believe anybody here is suggesting this would obviate the desired ability to provide an alternative hashing algorithm.
@ranma42, would you be so kind as to open an issue for that? It likely deserves its own discussion and shouldn't get tangled up with this one.
@tarcieri, do you know of any experience reports from the fallout of that change in the broader Java ecosystem?
I benchmarked it with @Gankro machinery, it definitely improves the speed. See https://i.imgur.com/5dKecOW.jpg
As we got confirmation it's secure enough it looks like a worthy improvement.
Yeah, looks good. Only worry is people "inappropriately" depending on the algorithm being stable.
Can't we include this variant as SipHasher13 and set it as the default hasher? I don't see how this can break anybodys code. Am I missing something?
I don't think we should change std::hash::SipHash,
since it's stable and documented to be SipHash-2-4. We can keep the new hasher private for now.
@bluss I can imagine support for an RFC to deprecate std::hash::SipHash
and replace it with std::hash::SipHash24
and std::hash::SipHash13
.
@Gankro, I'm leaning towards saying that this warrants an RFC, unless you think breakage will be essentially nanoscule. (Did we really document that our SipHash variant is specifically 2-4, and did we make it sound like that's guaranteed? If so, that's worth changing too.)
I think our HashMap is completely free to pick its hash function. Changing it should not be a problem.
SipHash 2-4 is, as far as I can tell, "the" SipHash, so it's totally sound for it to be exposed as SipHash. Having WeakSipHash or whatevs should be fine.
I'm fine with someone just submitting a PR with this change and letting the reviewers duke it out, but it's still of sufficient magnitude that we'll want a special note in the release notes/announcement.
:+1: The current hashing seem to be quite slow.
This is an easy improvement. Does anybody have a SipHash1-3 implementation? Or links to what that means and how to implement it? I don't see any here.
@brson I can implement one, if necessary.
It's simple to adapt the current implementation, 1-3 are just the hashing round counts after each 8 byte block and when finishing respectively. So we just need to change the number of compress!()
statements in the current code.
So a start to this would be:
I noticed that on the SipHash website, lots of languages use SipHash2-4, but there is not even any mention of SipHash1-3. I don't know what to make of that - are we just the first to catch on that 1-3 is sufficient?
@brson sounds like that's the case. There's no date on the @veorq quote from @funny-falcon, but I'm guessing it's pretty recent.
These are my results of generating the test vectors for siphash-1-3 (from the reference implementation) @ https://github.com/veorq/SipHash/commit/1b85a33b71f0fdd49942037a503b6798d67ef765
@brson the original SipHash work is basically entirely "use SipHash 2-4". It provides the maximum guarantees at the minimum cost, and all the analysis subsequently focuses on 2-4. However our use-case doesn't require the maximum guarantees.
The authors did mention -4-8 as the conservative choice in their original paper. So it's not exactly the maximum.
There is no maximum for SipHash. SipHash-100-1000 is very-very-very secure option. But this use case will be satisfied with SipHash-1-3, and SipHash author confirms this.
SipHash-100-1000 is very-very-very secure option.
No, that's not how cryptography works. Higher numbers do not necessarily give better security.
Number of rounds were always considered in investigation: first reduced number of rounds were cracked, then full round algorithm.
So you both right and not right: algorithm matters more, but for same good algorithm number of rounds matters much.
But when we consider this use-case we ought to consider that no hashsum result is exposed to attacker, so it is much harder to attack even slightly reduced algo.
Other alternatives include just making the current SipHash implementation faster or switching to another algorithm entirely. I recommend checking out https://github.com/google/highwayhash, where 10x performance improvements over already-optimized implementations are claimed. However, based on IRC conversation, it isn't clear if the 10x numbers are for cases that would be common in Rust. Also, there was some question about whether having AVX2-optimized or other use of non-universally-available instructions might cause too much overhead just in the dispatching logic.
Or switching to something better like Murmur hashing, and dropping Robin-hood probing in HashMap
....
FWIW, Highwayhash isn't backed by any security analysis, and at first glance it doesn't look as secure as SipHash (the multiplications may be abused, message chunks are injected only once, etc.).
On Fri, Mar 4, 2016 at 12:13 AM Brian Smith notifications@github.com wrote:
Other alternatives include just making the current SipHash implementation faster or switching to another algorithm entirely. I recommend checking out https://github.com/google/highwayhash, where 10x performance improvements over already-optimized implementations are claimed. However, based on IRC conversation, it isn't clear if the 10x numbers are for cases that would be common in Rust. Also, there was some question about whether having AVX2-optimized or other use of non-universally-available instructions might cause too much overhead just in the dispatching logic.
— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/29754#issuecomment-192012561.
Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.
@Ticki MurmurHash is a non-cryptographic hash function whose various iterations have been repeatedly broken. Here was one of the latest incarnations of a MurmurHash attack (2012):
https://www.youtube.com/watch?v=wGYj8fhhUVA#t=18m51s
SipHash looks like the most secure option, although HighwayHash looks interesting as well (but probably could use some more analysis)
Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.
Well, I guess that explains it. You're not familiar with the entire concept of hashDoS. If you have any sort of protocol that eagerly parses attacker-controlled data into HashMap
-style data structures, there's a potential computational attack which can chew up an entire thread for indefinite periods of time:
https://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html
FWIW, Highwayhash isn't backed by any security analysis, and at first glance it doesn't look as secure as SipHash (the multiplications may be abused, message chunks are injected only once, etc.).
Even still, they also have an ultra-fast implementation of standard SipHash and an even-ultra-faster implementation of a variant of Siphash in the highwayhash repo.
Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.
It's about secure defaults. Let's say you have 200 hash tables on your server. Then with 99.5% security, you have at least one vulnerability that could have been avoided with a more secure hashing function. However, there's no good reason to have a slow implementation of a secure hash function. Security tools need to be effective and fast so that it isn't tempting to bypass them.
I still don't get the rationale behind choosing such a slow hash function. Recently, I have been working on a ZFS implementation for Redox. By replacing the default hash function (SipHasher) with DJB2, I got a ~1.5-2 speed-up of the directory map.
If you need a secure hash function, you should replace your hash function. Default should reflect the majority of usecases, not some edge cases. Furthermore, the rounds of SipHasher Rust is using can hardly be called "secure".
Don't pay for what you don't use is C and C++ biggest advantage.
Title changed to reflect that there are now multiple options to consider for replacing the default algorithm.
@Ticki Rust tries to be both secure by default and fast by default, which is a tension that C and C++ don't attempt to balance. This means that the statement "if you need a secure hash function, you should replace your hash function" is not intrinsically more valid than "if you need a fast hash function, you should replace your hash function".
Jean-Philippe Aumasson answered on "can Siphash13 used instead of Siphash24 for hash function of hash tables":
So that, there is no reason to burn more cpu power on redundant rounds.