Closed ivte-ms closed 4 years ago
I have been worried about this previously (https://github.com/rurban/smhasher/issues/67), but guys says that is OK to ignore zero length problem. Now thank you very much to raise this issue. I will fixs it soon.
The problem is solved, I give double _wymum to protect the secret of seed well.
There are multiple scenarios where the missing dependency from the seed can be a problem.
For example, chaining the hashes. Imagine a case where the hash of the first input is used as a seed for the second input, etc.
h0 = hash(data0, size0, 0) h1 = hash(data1, size1, h0) ... h100 = hash(data100, size100, h99)
If, for example, size97 was 0, the final h100 will never depend on data10 .. data96 (because the seed for h97 will be ignored). And if size100 is 0, the result will always be 0, regardless of all the other inputs.
Chaining hashes like in the example above is a common pattern, and people do it quite often:
If they will decide to replace their old hash with wyhash without reading the implementation, they will silently break their code (because
getUserHash()
will start returning 0 almost all the time).The suggested solution is to either return the seed, or a simple hash of the seed. For example, something like:
This is just a suggestion, I didn't run tests for this.
Ideally the seed should be hashed equally well with normal inputs, so that the quality of
hash(&foo64, 8, 0)
is the same as the quality ofhash(0, 0, foo64)
.PS: thank you for creating wyhash. It is by far the best quick hash function I tried so far, I really hope it will become more popular.