gpakosz / smhasher

Forked from code.google.com/p/smhasher
0 stars 0 forks source link

MurmurHash3_x86_32 Algorithm Does Not match the one in Wikipedia #25

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Your MurmurHash3 implementation uses the following tail code:

  switch(len & 3)
  {
  case 3: k1 ^= tail[2] << 16;
  case 2: k1 ^= tail[1] << 8;
  case 1: k1 ^= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };

But this does match the (otherwise identical) MurmurHash algorithm in Wikipedia:

    with any remainingBytesInKey
        remainingBytes <- SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.

        remainingBytes <- remainingBytes * c1
        remainingBytes <- (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes <- remainingBytes * c2

        hash <- hash XOR remainingBytes

To make the two versions match, you'd have to change the bitwise-xor in your 
implementation to bitwise-or, as follows:

  switch(len & 3)
  {
  case 3: k1 |= tail[2] << 16;
  case 2: k1 |= tail[1] << 8;
  case 1: k1 |= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };

I'm just wondering which implementation is correct.  I can't figure out where 
the canonical implementation of the algorithm is.

Original issue reported on code.google.com by Matthew....@gmail.com on 17 Aug 2013 at 7:07

GoogleCodeExporter commented 9 years ago
Here's the Wikipedia link:
http://en.wikipedia.org/wiki/MurmurHash

Original comment by Matthew....@gmail.com on 17 Aug 2013 at 7:08

GoogleCodeExporter commented 9 years ago
The one here is the canonical one. Wikipedia is always a copy of something 
else, and often out of date and/or incorrect.

Original comment by frank...@gmail.com on 3 Dec 2013 at 6:37

GoogleCodeExporter commented 9 years ago
Since k1 starts of as 0, there is no difference in the result between XOR and 
OR.

To go even further, instead of OR-ing you can reverse the cases (1-2-3 instead 
of 3-2-1), and make each of them shift 8 to the right. All this switch does is 
make sure that the left-aligned bytes are shifted to the right and zero padded.

Original comment by ruud.alt...@gmail.com on 3 Feb 2015 at 2:07