Open sjakobi opened 4 years ago
In order to allow users to specify custom salts, HashMap
would need to be extended with an Int
field to store the salt. We could then add constructors like
emptyWithSalt :: Int -> HashMap k v
…to allow users to pass in a custom salt.
@tibbe mentioned in an email:
a new salt is unfortunately not enough if you don't have a strong hash function. You can generate collisions without knowing the salt. SipHash with a per-process random salt is the only solution I know for strings that don't perform terribly. Bryan O'Sullivan and I tried that but we couldn't get it to work with OK performance (i.e. not worse than Data.Map). Other language ecosystems seem to have come to the same conclusion about SipHash, at least when I looked years ago.
To me it seems that custom salts might not make it impossible to artificially produce collisions, but they should make it quite a bit harder.
Other implementations also do this. For example Rust's std::collections::HashMap
:
By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program.
The "randomly seeded" part here is important. There are really two things that go together:
Yes, the seed has to be unpredictable to an attacker, and the hash function itself (taking in the seed) has to be strong enough so the attacker can't just brute-force the unpredictability. So Hashable probably won't cut it, but also using SipHash itself with a predictable seed won't be sufficient either. What I wrote in another ticket:
[..] since
unordered-containers
is pure, to support a protection against this type of attack, the container constructor would have to take in an additional key parameter for seeding the hash function with, which could be generated by the caller viaIO
. For convenience, there could be a default seed in a top-level binding that is generated from runtime entropy viaunsafePerformIO
, but IMO the API should retain the possibility for the user to pass an explicit seed of their choice.
the hash function itself (taking in the seed) has to be strong enough so the attacker can't just brute-force the unpredictability
Does this imply that enabling HashMap
to use different/random seeds would be useless, unless we also switch to a stronger hash function than what Hashable
currently provides? (Which hash function does hashable
currently use BTW? – I can't find any documentation on this)
My impression was that susceptibility to HashDoS is a somewhat gradual affair where some attacks could already be prevented by just using an unpredictable seed. Is that wrong or possibly just a bad line of thought?
Yes, the random seed is useless, from a security perspective, without a stronger hash function. It's still needed to chain hashes to e.g. hash a record.
We currently use FNV1a for byte string and text I think. One could imagine having a newtype
to use another one (but this doesn't give security by default for users).
A more random seed really doesn't help without a strong hash function.
I currently see two main approaches for making the hash salt less predictable for an attacker:
Allow setting a different salt per map, as proposed in #45. #321 is one possible implementation of this. A different variant would look like somewhat like this:
data Hashmap = HashMap
{ hm :: !HashMap' -- the old HashMap type
, salt :: !Int
}
Use a global salt that is randomly generated on start-up, similar to hashable
's random-init-seed
flag.
For (1.) I see two big disadvantages:
Therefore (2.) currently seems more attractive to me. However, might (2.) be less secure than (1.)? How does leaking a salt work, and will (2.) be more prone to that than (1.)?
The
Hashable
class that this package relies on has ahashWithSalt
method that can be used to hash values with a custom salt. Currently this package doesn't use it, relying on thehash
method instead, which uses a hard-coded default salt.This means that it is currently rather easy to artificially produce hash collisions that trigger the performance degration reported in #121.