haskell-unordered-containers / unordered-containers

Efficient hashing-based container types
BSD 3-Clause "New" or "Revised" License
221 stars 99 forks source link

All HashMaps use the same default salt to hash keys #265

Open sjakobi opened 4 years ago

sjakobi commented 4 years ago

The Hashable class that this package relies on has a hashWithSalt method that can be used to hash values with a custom salt. Currently this package doesn't use it, relying on the hash method instead, which uses a hard-coded default salt.

This means that it is currently rather easy to artificially produce hash collisions that trigger the performance degration reported in #121.

sjakobi commented 4 years ago

In order to allow users to specify custom salts, HashMap would need to be extended with an Int field to store the salt. We could then add constructors like

emptyWithSalt :: Int -> HashMap k v

…to allow users to pass in a custom salt.

sjakobi commented 4 years ago

@tibbe mentioned in an email:

a new salt is unfortunately not enough if you don't have a strong hash function. You can generate collisions without knowing the salt. SipHash with a per-process random salt is the only solution I know for strings that don't perform terribly. Bryan O'Sullivan and I tried that but we couldn't get it to work with OK performance (i.e. not worse than Data.Map). Other language ecosystems seem to have come to the same conclusion about SipHash, at least when I looked years ago.

To me it seems that custom salts might not make it impossible to artificially produce collisions, but they should make it quite a bit harder.

Other implementations also do this. For example Rust's std::collections::HashMap:

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program.

tibbe commented 4 years ago

The "randomly seeded" part here is important. There are really two things that go together:

infinity0 commented 4 years ago

Yes, the seed has to be unpredictable to an attacker, and the hash function itself (taking in the seed) has to be strong enough so the attacker can't just brute-force the unpredictability. So Hashable probably won't cut it, but also using SipHash itself with a predictable seed won't be sufficient either. What I wrote in another ticket:

[..] since unordered-containers is pure, to support a protection against this type of attack, the container constructor would have to take in an additional key parameter for seeding the hash function with, which could be generated by the caller via IO. For convenience, there could be a default seed in a top-level binding that is generated from runtime entropy via unsafePerformIO, but IMO the API should retain the possibility for the user to pass an explicit seed of their choice.

sjakobi commented 4 years ago

the hash function itself (taking in the seed) has to be strong enough so the attacker can't just brute-force the unpredictability

Does this imply that enabling HashMap to use different/random seeds would be useless, unless we also switch to a stronger hash function than what Hashable currently provides? (Which hash function does hashable currently use BTW? – I can't find any documentation on this)

My impression was that susceptibility to HashDoS is a somewhat gradual affair where some attacks could already be prevented by just using an unpredictable seed. Is that wrong or possibly just a bad line of thought?

tibbe commented 4 years ago

Yes, the random seed is useless, from a security perspective, without a stronger hash function. It's still needed to chain hashes to e.g. hash a record.

We currently use FNV1a for byte string and text I think. One could imagine having a newtype to use another one (but this doesn't give security by default for users).

A more random seed really doesn't help without a strong hash function.

sjakobi commented 3 years ago

I currently see two main approaches for making the hash salt less predictable for an attacker:

  1. Allow setting a different salt per map, as proposed in #45. #321 is one possible implementation of this. A different variant would look like somewhat like this:

    data Hashmap = HashMap
     { hm :: !HashMap' -- the old HashMap type
     , salt :: !Int
     }
  2. Use a global salt that is randomly generated on start-up, similar to hashable's random-init-seed flag.

For (1.) I see two big disadvantages:

Therefore (2.) currently seems more attractive to me. However, might (2.) be less secure than (1.)? How does leaking a salt work, and will (2.) be more prone to that than (1.)?