Open cimi opened 6 years ago
I'm fixing this in https://github.com/karanlyons/murmurHash3.js/pull/6 and once I'm certain I haven't messed anything up will throw it up as an npm package so that we can hopefully deprecate and consolidate all the downstream variants to a modern implementation. Please take a look and let me know if there's anything strange looking!
👋 Hi there!
First of all, thank you for your work! I've been successfully using this library in production for a couple of years and it's been very useful.
Recently we've started using MurmurHash3 on other platforms - we need the results to match and noticed discrepancies between the output of the JS version and the other platforms when the input had characters that were not regular ASCII (i.e.
charCodeAt
is not between 0 and 127).This is because in some places the code does
key.charCodeAt(i) & 0xff
and in other places justkey.charCodeAt(i)
. The byte representation for regular ASCII characters is identical with the character code so for e.g. alphanumeric input this doesn't matter. If the input characters are outside this range, the results start to diverge with the reference implementation.All the three variants have this problem. For example, here's the output for the x86 32bit version:
The string was utf-8 encoded before being passed in to the C++ reference as that expects bytes. I think it's fair to expect people using other implementations that ask for bytes will do this.
I decided to change the signature of the function to make it expect bytes. I checked my implementation along with a few others against the reference C++ implementation. You can read more about it and try out an interactive version of the comparison here.
Since I needed a quick release and the new signature is a major/breaking change compared to this implementation, I published my own version of the library as
murmurhash3js-revisited
. I tried to keep all attribution, but if you have any concerns please let me know!