Change Siphash to use one of the faster variants of the algorithm (Siphash13, Highwayhash)

funny-falcon commented 9 years ago

Jean-Philippe Aumasson answered on "can Siphash13 used instead of Siphash24 for hash function of hash tables":

we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found, although many competent people tried to break it. However, fewer rounds may be sufficient and I would be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

So that, there is no reason to burn more cpu power on redundant rounds.

bstrie commented 9 years ago

Sounds like it would only take a two-line change to test the perf impact of this change. @Gankro, did you have some hashtable benchmark scripts lying around?

arthurprs commented 9 years ago

That'd be interesting. Sub.

Gankra commented 9 years ago

You can fork http://cglab.ca/~abeinges/blah/hash-rs/ if you want to do some ok-ish comparisons.

@bluss is my goto expert on siphash.

Gankra commented 9 years ago

Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.

bluss commented 9 years ago

Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically? I'm curious what people have learned about SipHash in general now that it's been out for several years.

If we weaken it so much that it doesn't prevent hash flooding anymore, then we lose the reason to use SipHash and could use a faster hash function.

I know I tried long ago and noted that SipHash-1-2 didn't even pass the Smhasher testsuite (which is probably bad). (SipHash-N-M is for N hash rounds per 8 bytes of input and M fixed rounds as finalization).

funny-falcon commented 9 years ago

Siphash-1-3 passes Smhasher. It is smallest configuration, which has good avalanche.

Siphash is protected from "seed independent collisions" with any configuration with good avalanche.

Siphash24 is recomended for MAC cause when you do MAC you put result in public, so there are other vectors of attack: plain text, chosen plain text, differential, etc. As cryptographers, Siphash authors simply recommend more rounds than minimal required to protect against future investigations.

Within hash-table result of hash-function is not published, so there is no way to recover seed. Then protection from "seed independent collisions" is just enough.

Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.

Agree. There should be separate SipHash13 for internal/hash-table usage, and SipHash (24) remains for direct/external usage.

funny-falcon commented 9 years ago

Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically?

I asked him year ago about using SipHash13 for hash tables. Answer is in issue text.

I think, we can ask him again.

tarcieri commented 9 years ago

If @veorq feels its fine then it should be fine. The worst case is there's a hashDoS against Siphash13, in which case you can upgrade to a stronger hash function.

bstrie commented 9 years ago

@Gankro, how likely do you think this is to break code in the wild? I was under the impression that we considered the precise underlying default hash function to be an implementation detail (with the caveat that you can count on it to be cryptographically secure).

tarcieri commented 9 years ago

@bstrie from a security perspective, I'd say you definitely should not couple to the hash function in any way. Java made that mistake (the hashing algorithm was included in the Java Language Specification) and that made it so they were not able to adequately respond to hashDoS.

bstrie commented 9 years ago

@tarcieri That's actually tied in with what I'm implying here. Supposing that we did switch to 1-3, and that an attack on 1-3 was later discovered, we'd want to reserve the right to bump it back up to 2-4 (or, you know, whatever the state of the art is by then) without risking massive breakage in the ecosystem. If changing the default hash function should prove to be disruptive at the current point in time then that's a separate problem that we need to consider, hence why I'm curious as to the true magnitude of such a change.

Furthermore, I get the impression that the breakage that Gankro's envisioning wouldn't be detectable by Crater.

bluss commented 9 years ago

Using 1-3 doesn't fix the fundamental slowness of our hashing, but it will instead speed up the long data case. The short data case remains a problem, due to (I think):

Lack of one-shot hashing, means the code visits branches to read and or update the partially written tail
Strings and slices write both data and length / delimiter, which is slow since it enters SipHash::write twice, hitting the code in the first point.

Of course we should still go for it, if possible, but I don't think it's a substitute for allowing faster alternatives to be used (application may choose to sacrifice hash flooding defence, deciding it's not useful for them).

tarcieri commented 9 years ago

@bstrie it seems when Java was last affected by hashDoS they ultimately ended up making a breaking change to the language specification:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html

ranma42 commented 9 years ago

Did anybody experiment adding padding to 8 bytes in SipHash? In particular, if we padded whatever is input to 8 bytes, it would be possible to remove the ntail handling code here https://github.com/rust-lang/rust/blob/e3cd87241898cd88e4348e9c6142a03d8909c4e0/src/libcore/hash/sip.rs#L151-L167 and https://github.com/rust-lang/rust/blob/e3cd87241898cd88e4348e9c6142a03d8909c4e0/src/libcore/hash/sip.rs#L185-L186

veorq commented 9 years ago

SipHash designer here, haven't changed my opinion about SipHash-1-3 :-)

A warning, though: SipHash-1-3 leaves 4 rounds between the last attacker-controlled input and the output. There's a "distinguisher" on 4 rounds in https://eprint.iacr.org/2014/722.pdf, or in simplest terms a statistical bias that shows up given a specific difference pattern in the input of the 4-round sequence. But you can't inject that pattern in SipHash-1-3 because you don't control all the state. And even if you could inject that pattern the bias wouldn't be exploitable anyway.

arthurprs commented 9 years ago

@bluss yeah, sooner of later that will need to be addressed as well.

bstrie commented 9 years ago

@veorq, thanks for your input!

@bluss, indeed I don't believe anybody here is suggesting this would obviate the desired ability to provide an alternative hashing algorithm.

bstrie commented 9 years ago

@ranma42, would you be so kind as to open an issue for that? It likely deserves its own discussion and shouldn't get tangled up with this one.

@tarcieri, do you know of any experience reports from the fallout of that change in the broader Java ecosystem?

arthurprs commented 9 years ago

I benchmarked it with @Gankro machinery, it definitely improves the speed. See https://i.imgur.com/5dKecOW.jpg

As we got confirmation it's secure enough it looks like a worthy improvement.

Gankra commented 9 years ago

Yeah, looks good. Only worry is people "inappropriately" depending on the algorithm being stable.

arthurprs commented 9 years ago

Can't we include this variant as SipHasher13 and set it as the default hasher? I don't see how this can break anybodys code. Am I missing something?

bluss commented 9 years ago

I don't think we should change std::hash::SipHash, since it's stable and documented to be SipHash-2-4. We can keep the new hasher private for now.

bstrie commented 9 years ago

@bluss I can imagine support for an RFC to deprecate std::hash::SipHash and replace it with std::hash::SipHash24 and std::hash::SipHash13.

bstrie commented 9 years ago

@Gankro, I'm leaning towards saying that this warrants an RFC, unless you think breakage will be essentially nanoscule. (Did we really document that our SipHash variant is specifically 2-4, and did we make it sound like that's guaranteed? If so, that's worth changing too.)

bluss commented 9 years ago

I think our HashMap is completely free to pick its hash function. Changing it should not be a problem.

Gankra commented 9 years ago

SipHash 2-4 is, as far as I can tell, "the" SipHash, so it's totally sound for it to be exposed as SipHash. Having WeakSipHash or whatevs should be fine.

bstrie commented 9 years ago

I'm fine with someone just submitting a PR with this change and letting the reviewers duke it out, but it's still of sufficient magnitude that we'll want a special note in the release notes/announcement.

ticki commented 8 years ago

:+1: The current hashing seem to be quite slow.

brson commented 8 years ago

This is an easy improvement. Does anybody have a SipHash1-3 implementation? Or links to what that means and how to implement it? I don't see any here.

ticki commented 8 years ago

@brson I can implement one, if necessary.

bluss commented 8 years ago

It's simple to adapt the current implementation, 1-3 are just the hashing round counts after each 8 byte block and when finishing respectively. So we just need to change the number of compress!() statements in the current code.

brson commented 8 years ago

So a start to this would be:

Create the private SipHash13 implementation
Replace the default hasher without breaking any public APIs
Benchmark some things

brson commented 8 years ago

I noticed that on the SipHash website, lots of languages use SipHash2-4, but there is not even any mention of SipHash1-3. I don't know what to make of that - are we just the first to catch on that 1-3 is sufficient?

tarcieri commented 8 years ago

@brson sounds like that's the case. There's no date on the @veorq quote from @funny-falcon, but I'm guessing it's pretty recent.

bluss commented 8 years ago

These are my results of generating the test vectors for siphash-1-3 (from the reference implementation) @ https://github.com/veorq/SipHash/commit/1b85a33b71f0fdd49942037a503b6798d67ef765

Gankra commented 8 years ago

@brson the original SipHash work is basically entirely "use SipHash 2-4". It provides the maximum guarantees at the minimum cost, and all the analysis subsequently focuses on 2-4. However our use-case doesn't require the maximum guarantees.

bluss commented 8 years ago

The authors did mention -4-8 as the conservative choice in their original paper. So it's not exactly the maximum.

funny-falcon commented 8 years ago

There is no maximum for SipHash. SipHash-100-1000 is very-very-very secure option. But this use case will be satisfied with SipHash-1-3, and SipHash author confirms this.

ticki commented 8 years ago

SipHash-100-1000 is very-very-very secure option.

No, that's not how cryptography works. Higher numbers do not necessarily give better security.

funny-falcon commented 8 years ago

Number of rounds were always considered in investigation: first reduced number of rounds were cracked, then full round algorithm.

So you both right and not right: algorithm matters more, but for same good algorithm number of rounds matters much.

But when we consider this use-case we ought to consider that no hashsum result is exposed to attacker, so it is much harder to attack even slightly reduced algo.

briansmith commented 8 years ago

Other alternatives include just making the current SipHash implementation faster or switching to another algorithm entirely. I recommend checking out https://github.com/google/highwayhash, where 10x performance improvements over already-optimized implementations are claimed. However, based on IRC conversation, it isn't clear if the 10x numbers are for cases that would be common in Rust. Also, there was some question about whether having AVX2-optimized or other use of non-universally-available instructions might cause too much overhead just in the dispatching logic.

ticki commented 8 years ago

Or switching to something better like Murmur hashing, and dropping Robin-hood probing in HashMap....

veorq commented 8 years ago

FWIW, Highwayhash isn't backed by any security analysis, and at first glance it doesn't look as secure as SipHash (the multiplications may be abused, message chunks are injected only once, etc.).

On Fri, Mar 4, 2016 at 12:13 AM Brian Smith notifications@github.com wrote:

Other alternatives include just making the current SipHash implementation faster or switching to another algorithm entirely. I recommend checking out https://github.com/google/highwayhash, where 10x performance improvements over already-optimized implementations are claimed. However, based on IRC conversation, it isn't clear if the 10x numbers are for cases that would be common in Rust. Also, there was some question about whether having AVX2-optimized or other use of non-universally-available instructions might cause too much overhead just in the dispatching logic.

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/29754#issuecomment-192012561.

ticki commented 8 years ago

Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.

tarcieri commented 8 years ago

@Ticki MurmurHash is a non-cryptographic hash function whose various iterations have been repeatedly broken. Here was one of the latest incarnations of a MurmurHash attack (2012):

https://www.youtube.com/watch?v=wGYj8fhhUVA#t=18m51s

SipHash looks like the most secure option, although HighwayHash looks interesting as well (but probably could use some more analysis)

Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.

Well, I guess that explains it. You're not familiar with the entire concept of hashDoS. If you have any sort of protocol that eagerly parses attacker-controlled data into HashMap-style data structures, there's a potential computational attack which can chew up an entire thread for indefinite periods of time:

https://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html

briansmith commented 8 years ago

FWIW, Highwayhash isn't backed by any security analysis, and at first glance it doesn't look as secure as SipHash (the multiplications may be abused, message chunks are injected only once, etc.).

Even still, they also have an ultra-fast implementation of standard SipHash and an even-ultra-faster implementation of a variant of Siphash in the highwayhash repo.

briansmith commented 8 years ago

Why is security of hash tables such a big deal? 99.5% of the time, you don't need any security.

It's about secure defaults. Let's say you have 200 hash tables on your server. Then with 99.5% security, you have at least one vulnerability that could have been avoided with a more secure hashing function. However, there's no good reason to have a slow implementation of a secure hash function. Security tools need to be effective and fast so that it isn't tempting to bypass them.

ticki commented 8 years ago

I still don't get the rationale behind choosing such a slow hash function. Recently, I have been working on a ZFS implementation for Redox. By replacing the default hash function (SipHasher) with DJB2, I got a ~1.5-2 speed-up of the directory map.

If you need a secure hash function, you should replace your hash function. Default should reflect the majority of usecases, not some edge cases. Furthermore, the rounds of SipHasher Rust is using can hardly be called "secure".

Don't pay for what you don't use is C and C++ biggest advantage.

bstrie commented 8 years ago

Title changed to reflect that there are now multiple options to consider for replacing the default algorithm.

bstrie commented 8 years ago

@Ticki Rust tries to be both secure by default and fast by default, which is a tension that C and C++ don't attempt to balance. This means that the statement "if you need a secure hash function, you should replace your hash function" is not intrinsically more valid than "if you need a fast hash function, you should replace your hash function".

rust-lang / rust

Change Siphash to use one of the faster variants of the algorithm (Siphash13, Highwayhash) #29754