micro-js / hash-str

0 stars 0 forks source link

Algorithm? #1

Open axdg opened 8 years ago

axdg commented 8 years ago

@ashaffer, out of curiosity, what is the algorithm that you're using here. It is indeed fast, how well distributed is the output?

ashaffer commented 8 years ago

Well, I copied it from here:

https://github.com/mattbierner/hamt/blob/master/lib/hamt.js#L31

And he copied it from here:

http://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript-jquery

Who copied it from here:

http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

Who claims it is a js implementation of the standard java string hashCode function.

The hash |= 0 doesn't contribute at all to the result, it just forces JS to turn it into a 32-bit integer. So the meat of the algorithm is just hash = ((hash << 5) - hash) + c. Or (h*2^5 - h) + c

So, it's defined by the recurrence relation:

H(n+1) = (H(n)*2^5 - H(n)) + c

Which is really:

H(n+1) = H(n)*31 + c

With bit truncation at 32-bits. Which makes this a Linear Congruential Generator that is being reseeded by the current character at each step, with the modulus m set to 2^32.

I haven't actually tested its output, but it seems theoretically sound, and it is Java's solution to the problem, so I think its probably pretty good? It's probably worth testing though. I can say that in mini-hamt I have a collision generator to test hash collisions, and it takes quite a while to find them!

axdg commented 8 years ago

Wow, I didn't expect you to track down the implementation like that. Thanks.

At some point I'll take a look at the collision frequency. If it's relatively low, it might be a good second hash (together with FNV) for implementing a bloom filter in JS. Although - it does seems like it would have a tendency to produce lower values for the most significant bits (like all PRNGs).

mini-hamt - Nice!

ashaffer commented 8 years ago

Ya let me know about your results if you decide to test it out. I'd be curious. I should really write some tests and put them in the repo. Would be nice to have a set of libraries for doing these kinds of key distribution tests.

Also, in theory, I think, you should be able to just change the prime from 31 to any other prime and get a brand new independent hash function. I'm not certain about that though.

EDIT: Oh, and yes, I think it definitely has some clustering for short strings. That fact actually uncovered a bug in mini-hamt while I was testing it - it broke on short strings because they tended to generate negative keys for which there was a bug.