pid / murmurHash3js

A JavaScript implementation of MurmurHash3's x86 hashing algorithms - npm - nodejs module
http://pid.github.io/murmurHash3js/
MIT License
49 stars 7 forks source link

Output matches C++ reference implementation only for regular ASCII inputs #3

Open cimi opened 6 years ago

cimi commented 6 years ago

👋 Hi there!

First of all, thank you for your work! I've been successfully using this library in production for a couple of years and it's been very useful.

Recently we've started using MurmurHash3 on other platforms - we need the results to match and noticed discrepancies between the output of the JS version and the other platforms when the input had characters that were not regular ASCII (i.e. charCodeAt is not between 0 and 127).

This is because in some places the code does key.charCodeAt(i) & 0xff and in other places just key.charCodeAt(i). The byte representation for regular ASCII characters is identical with the character code so for e.g. alphanumeric input this doesn't matter. If the input characters are outside this range, the results start to diverge with the reference implementation.

All the three variants have this problem. For example, here's the output for the x86 32bit version:

Input C++ result (reference) murmurhash3js result
'My hovercraft is full of eels.' 2953494853 2953494853
'My 🚀 is full of 🦎.' 1818098979 2899624186
'吉 星 高 照' 3435142074 4163339522

The string was utf-8 encoded before being passed in to the C++ reference as that expects bytes. I think it's fair to expect people using other implementations that ask for bytes will do this.

I decided to change the signature of the function to make it expect bytes. I checked my implementation along with a few others against the reference C++ implementation. You can read more about it and try out an interactive version of the comparison here.

Since I needed a quick release and the new signature is a major/breaking change compared to this implementation, I published my own version of the library as murmurhash3js-revisited. I tried to keep all attribution, but if you have any concerns please let me know!

karanlyons commented 5 years ago

I'm fixing this in https://github.com/karanlyons/murmurHash3.js/pull/6 and once I'm certain I haven't messed anything up will throw it up as an npm package so that we can hopefully deprecate and consolidate all the downstream variants to a modern implementation. Please take a look and let me know if there's anything strange looking!