wangyi-fudan / wyhash

The FASTEST QUALITY hash function, random number generators (PRNG) and hash map.
The Unlicense
970 stars 73 forks source link

The output for empty inputs doesn't depend on the seed. #36

Closed ivte-ms closed 4 years ago

ivte-ms commented 4 years ago

There are multiple scenarios where the missing dependency from the seed can be a problem.

For example, chaining the hashes. Imagine a case where the hash of the first input is used as a seed for the second input, etc.

h0 = hash(data0, size0, 0) h1 = hash(data1, size1, h0) ... h100 = hash(data100, size100, h99)

If, for example, size97 was 0, the final h100 will never depend on data10 .. data96 (because the seed for h97 will be ignored). And if size100 is 0, the result will always be 0, regardless of all the other inputs.

Chaining hashes like in the example above is a common pattern, and people do it quite often:

uint64_t getUserHash(const User& user) {
  auto h0 = hash(user.firstName.data(), user.firstName.size(), 0);
  auto h1 = hash(user.lastName.data(), user.lastName.size(), h0);
  auto h2 = hash(user.company.data(), user.company.size(), h1);
  // notes are almost always empty
  return hash(user.notes.data(), user.notes.size(), h2);
}

If they will decide to replace their old hash with wyhash without reading the implementation, they will silently break their code (because getUserHash() will start returning 0 almost all the time).

The suggested solution is to either return the seed, or a simple hash of the seed. For example, something like:

return _wymum(_wymum(seed^_wyp0, _wyp1), _wyp2);

This is just a suggestion, I didn't run tests for this.

Ideally the seed should be hashed equally well with normal inputs, so that the quality of hash(&foo64, 8, 0) is the same as the quality of hash(0, 0, foo64).

PS: thank you for creating wyhash. It is by far the best quick hash function I tried so far, I really hope it will become more popular.

wangyi-fudan commented 4 years ago

I have been worried about this previously (https://github.com/rurban/smhasher/issues/67), but guys says that is OK to ignore zero length problem. Now thank you very much to raise this issue. I will fixs it soon.

wangyi-fudan commented 4 years ago

The problem is solved, I give double _wymum to protect the secret of seed well.