Closed steveklabnik closed 6 years ago
The current version of HashMap is still around 4 times slower than C++ unordered map on int keys. It is quite unfortunate.
Hash maps with integer keys are quite common and they usually operate on trusted data. A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default). Is there any intent to change this behavior?
Is that still true using BTreeMap and the FNV hasher? (see the first code example). I did not do the checks by myself, but I believe the situation has improved since the creation of this issue.
Yes, @Luthaf is correct.
A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default).
This is not the current position of the libs team, in my understanding, but regardless, we cannot change this now; the ship has sailed. /cc @rust-lang/libs
but regardless, we cannot change this now; the ship has sailed.
It actually can. Changing it would be a non-breaking change.
@Ticki
It actually can. Changing it would be a non-breaking change.
It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap
is susceptible to DOS attacks.
It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap is susceptible to DOS attacks.
The HashMap guarantee no such thing.
In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.
That's why the default option that uses collision-resistant hash functions is a bit stupid.
In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.
That's why the default option that uses collision-resistant hash functions is a bit stupid.
The same argument could be made in favor of having HashMap
be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.
The same argument could be made in favor of having HashMap be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.
But the point is that the majority of cases DOS is not a problem.
But the point is that the majority of cases DOS is not a problem.
That may be true, but is map performance a problem in the majority of cases?
That may be true, but is map performance a problem in the majority of cases?
A lot actually. Let me give you an example:
Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)
A lot actually. Let me give you an example:
Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)
Right. You profiled your application and determined that you could gain increased performance by switching. There are certainly cases in which DOS is not a concern, but it should be a conscious choice to trade security for speed.
That may be true, but is map performance a problem in the majority of cases?
If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables have a benefit of being extremely fast. One should not sacrifice it for 'security'.
To trust or not to trust, that is the question. Hash tables hit pathological behavior only under crafted inputs (which means, in essence, most inputs are perfectly fine). Choosing a "secure" hash table trades off the default case for the esoteric. Imagine, if the binary is "unzip", we're trading off the common case performance for resilience in the uncommon case. The costs add up in large applications.
Divye On Tue, Mar 22, 2016 at 12:40 PM Andrew Paseltiner notifications@github.com wrote:
A lot actually. Let me give you an example:
Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)
Right. You profiled your application and determined that you could gain increased performance by switching. There are certainly cases in which DOS is not a concern, but it should be a conscious choice to trade security for speed.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rfcs/issues/631#issuecomment-199981823
If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables has a benefit of being extremely fast. One should not sacrifice it for 'security'.
It doesn't matter if you pick an asymptotically slower data structure like a linked list over a hash map if it has no effect on your application's overall performance. If a hash map with DOS-resistant hashing is the application's bottleneck, and you determine that the tradeoff in security is acceptable, you can switch to a faster hash function. I'm not arguing against the existence of a non-DOS-resistant hash function, just that the default behavior is a default and can be changed. There are tradeoffs in both cases, whether it's secure-by-default or fast-by-default.
The costs add up in large applications.
I agree, and in this scenario the application can opt to use something non-default.
In any case, I agree with @steveklabnik that it can't be changed now. Despite it possibly not being a breaking change at the API level, I think the docs are pretty clear:
The hashes are all keyed by the thread-local random number generator on creation by default. This means that the ordering of the keys is randomized, but makes the tables more resistant to denial-of-service attacks (Hash DoS). This behavior can be overridden with one of the constructors.
One can get the improved performance by importing the type as
type HashMap<K, V> = std::collections::HashMap<K, V, FnvHasher>;
I think this is not actionable and there has been no movement during the last 2 years so I'm closing this.
Issue by brson Saturday Jan 25, 2014 at 00:54 GMT
For earlier discussion, see https://github.com/rust-lang/rust/issues/11783
This issue was labelled with: A-an-interesting-project, A-libs, I-compiletime, I-slow, I-wishlist in the Rust repository
Our current HashMap uses cryptographically secure hashing but it is slow. Sometimes this is a big problem. Implement another hash map that is very fast.