Closed broofa closed 3 years ago
There is a simple test in the repository and claims in the readme is only limited to the existing test. Right after saying
Returns a number between 0 and 18446744073709552000
it also says
Repository includes a test for hashing 65 million strings with zero collisions. That's 400k+ less than what you get with djb2.
Because when I run the same test in the repository with djb2 I get 400k+ collusions. There is no claim of 18446744073709552000 hashes without collisions. It already starts colliding quickly after 65 million hashes. Also, the test in the repository is using stringified numbers from 1 to 65 million, thus there's no guarantee that you won't get collisions with other kinds of strings. Needs testing with dictionaries etc.
My concern here is that by saying "64 bit integer hashes", you're setting certain expectations which, frankly, your code doesn't meet.
"64 bit integer" implies... no, directly states that you're generating hashes with 64-bits of entropy. I.e. that there are 2^^64 values in the hash space. And implicit in that is that the hashes are reasonably well distributed in that space. That's not the case.
The way you combine hash1
and hash2
in (this line) only produces 43-bit integer values that may occasionally rollover into the 44th bit.
[Edits: To remove a bunch of incorrect/poorly worded verbiage. My apologies. I've been mixing up PRNG and hash algorithm stuff lately. :-p]
Thank you for going into details of this, I will add disclaimers about 44bits limit tops and not tested entropy. But if it's doing well within a smaller set [0-9], I expect it to do better when the set is extended. It just collides less than djb2 in tested scope for virtually the same performance. That's all I can claim. If you get to test it more, feel free to add detailed benchmarks.
Wish it exhibited a more diverse output.
meow 8555689732215
woof 8553169777750
^^^
Closing out old issues that haven't been acted on.
Can I politely suggest that your README should contain a disclaimer indicating that there are some important caveats to this module?
JSNumbers only have 52-bits (+ 1 sign bit) of resolution. Any number >
Number.MAX_SAFE_INTEGER
(9007199254740991) will see (often quite large) rounding errors.Also, I suspect the distribution of values in this algorithm is quite distorted, meaning the chance of collisions is actually much higher than your "64 bit" claim might lead people to expect. At a minimum, I would say the algorithm is of, "untested" quality.