rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.93k stars 1.57k forks source link

Implement a fast HashMap / hash function #631

Closed steveklabnik closed 6 years ago

steveklabnik commented 9 years ago

Issue by brson Saturday Jan 25, 2014 at 00:54 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/11783

This issue was labelled with: A-an-interesting-project, A-libs, I-compiletime, I-slow, I-wishlist in the Rust repository


Our current HashMap uses cryptographically secure hashing but it is slow. Sometimes this is a big problem. Implement another hash map that is very fast.

ilyaraz commented 9 years ago

The current version of HashMap is still around 4 times slower than C++ unordered map on int keys. It is quite unfortunate.

divyekapoor commented 8 years ago

Hash maps with integer keys are quite common and they usually operate on trusted data. A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default). Is there any intent to change this behavior?

Luthaf commented 8 years ago

Is that still true using BTreeMap and the FNV hasher? (see the first code example). I did not do the checks by myself, but I believe the situation has improved since the creation of this issue.

steveklabnik commented 8 years ago

Yes, @Luthaf is correct.

A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default).

This is not the current position of the libs team, in my understanding, but regardless, we cannot change this now; the ship has sailed. /cc @rust-lang/libs

ticki commented 8 years ago

but regardless, we cannot change this now; the ship has sailed.

It actually can. Changing it would be a non-breaking change.

seanmonstar commented 8 years ago

@Ticki

It actually can. Changing it would be a non-breaking change.

It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap is susceptible to DOS attacks.

ticki commented 8 years ago

It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap is susceptible to DOS attacks.

The HashMap guarantee no such thing.

ilyaraz commented 8 years ago

In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.

That's why the default option that uses collision-resistant hash functions is a bit stupid.

apasel422 commented 8 years ago

In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.

That's why the default option that uses collision-resistant hash functions is a bit stupid.

The same argument could be made in favor of having HashMap be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.

ticki commented 8 years ago

The same argument could be made in favor of having HashMap be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.

But the point is that the majority of cases DOS is not a problem.

apasel422 commented 8 years ago

But the point is that the majority of cases DOS is not a problem.

That may be true, but is map performance a problem in the majority of cases?

ticki commented 8 years ago

That may be true, but is map performance a problem in the majority of cases?

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)

apasel422 commented 8 years ago

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)

Right. You profiled your application and determined that you could gain increased performance by switching. There are certainly cases in which DOS is not a concern, but it should be a conscious choice to trade security for speed.

ilyaraz commented 8 years ago

That may be true, but is map performance a problem in the majority of cases?

If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables have a benefit of being extremely fast. One should not sacrifice it for 'security'.

divyekapoor commented 8 years ago

To trust or not to trust, that is the question. Hash tables hit pathological behavior only under crafted inputs (which means, in essence, most inputs are perfectly fine). Choosing a "secure" hash table trades off the default case for the esoteric. Imagine, if the binary is "unzip", we're trading off the common case performance for resilience in the uncommon case. The costs add up in large applications.

Divye On Tue, Mar 22, 2016 at 12:40 PM Andrew Paseltiner notifications@github.com wrote:

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)

Right. You profiled your application and determined that you could gain increased performance by switching. There are certainly cases in which DOS is not a concern, but it should be a conscious choice to trade security for speed.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rfcs/issues/631#issuecomment-199981823

apasel422 commented 8 years ago

If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables has a benefit of being extremely fast. One should not sacrifice it for 'security'.

It doesn't matter if you pick an asymptotically slower data structure like a linked list over a hash map if it has no effect on your application's overall performance. If a hash map with DOS-resistant hashing is the application's bottleneck, and you determine that the tradeoff in security is acceptable, you can switch to a faster hash function. I'm not arguing against the existence of a non-DOS-resistant hash function, just that the default behavior is a default and can be changed. There are tradeoffs in both cases, whether it's secure-by-default or fast-by-default.

The costs add up in large applications.

I agree, and in this scenario the application can opt to use something non-default.

In any case, I agree with @steveklabnik that it can't be changed now. Despite it possibly not being a breaking change at the API level, I think the docs are pretty clear:

The hashes are all keyed by the thread-local random number generator on creation by default. This means that the ordering of the keys is randomized, but makes the tables more resistant to denial-of-service attacks (Hash DoS). This behavior can be overridden with one of the constructors.

One can get the improved performance by importing the type as

type HashMap<K, V> = std::collections::HashMap<K, V, FnvHasher>;
Centril commented 6 years ago

I think this is not actionable and there has been no movement during the last 2 years so I'm closing this.