Output matches C++ reference implementation only for regular ASCII inputs

👋 Hi there!

First of all, thank you for your work! I've been successfully using this library in production for a couple of years and it's been very useful.

Recently we've started using MurmurHash3 on other platforms - we need the results to match and noticed discrepancies between the output of the JS version and the other platforms when the input had characters that were not regular ASCII (i.e. charCodeAt is not between 0 and 127).

This is because in some places the code does key.charCodeAt(i) & 0xff and in other places just key.charCodeAt(i). The byte representation for regular ASCII characters is identical with the character code so for e.g. alphanumeric input this doesn't matter. If the input characters are outside this range, the results start to diverge with the reference implementation.

All the three variants have this problem. For example, here's the output for the x86 32bit version:

Input	C++ result (reference)	murmurhash3js result
'My hovercraft is full of eels.'	2953494853	2953494853
'My 🚀 is full of 🦎.'	1818098979	2899624186
'吉星高照'	3435142074	4163339522

The string was utf-8 encoded before being passed in to the C++ reference as that expects bytes. I think it's fair to expect people using other implementations that ask for bytes will do this.

I decided to change the signature of the function to make it expect bytes. I checked my implementation along with a few others against the reference C++ implementation. You can read more about it and try out an interactive version of the comparison here.

Since I needed a quick release and the new signature is a major/breaking change compared to this implementation, I published my own version of the library as murmurhash3js-revisited. I tried to keep all attribution, but if you have any concerns please let me know!

pid / murmurHash3js

Output matches C++ reference implementation only for regular ASCII inputs #3