Open axdg opened 8 years ago
Well, I copied it from here:
https://github.com/mattbierner/hamt/blob/master/lib/hamt.js#L31
And he copied it from here:
http://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript-jquery
Who copied it from here:
http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/
Who claims it is a js implementation of the standard java string hashCode
function.
The hash |= 0
doesn't contribute at all to the result, it just forces JS to turn it into a 32-bit integer. So the meat of the algorithm is just hash = ((hash << 5) - hash) + c
. Or (h*2^5 - h) + c
So, it's defined by the recurrence relation:
H(n+1) = (H(n)*2^5 - H(n)) + c
Which is really:
H(n+1) = H(n)*31 + c
With bit truncation at 32-bits. Which makes this a Linear Congruential Generator that is being reseeded by the current character at each step, with the modulus m
set to 2^32.
I haven't actually tested its output, but it seems theoretically sound, and it is Java's solution to the problem, so I think its probably pretty good? It's probably worth testing though. I can say that in mini-hamt I have a collision generator to test hash collisions, and it takes quite a while to find them!
Wow, I didn't expect you to track down the implementation like that. Thanks.
At some point I'll take a look at the collision frequency. If it's relatively low, it might be a good second hash (together with FNV) for implementing a bloom filter in JS. Although - it does seems like it would have a tendency to produce lower values for the most significant bits (like all PRNGs).
mini-hamt - Nice!
Ya let me know about your results if you decide to test it out. I'd be curious. I should really write some tests and put them in the repo. Would be nice to have a set of libraries for doing these kinds of key distribution tests.
Also, in theory, I think, you should be able to just change the prime from 31 to any other prime and get a brand new independent hash function. I'm not certain about that though.
EDIT: Oh, and yes, I think it definitely has some clustering for short strings. That fact actually uncovered a bug in mini-hamt while I was testing it - it broke on short strings because they tended to generate negative keys for which there was a bug.
@ashaffer, out of curiosity, what is the algorithm that you're using here. It is indeed fast, how well distributed is the output?