hugowan / maatkit

Automatically exported from code.google.com/p/maatkit
0 stars 0 forks source link

Add MURMUR_HASH() hash function as a UDF #19

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Add MurmurHash, described at http://murmurhash.googlepages.com as another
hash function. This should be even faster than FNV_64() for table
checksums and syncing.

Note: the author sent Baron some code for a 64-bit version that should be
what we need to duplicate the FNV_64() hash function.

Original issue reported on code.google.com by baron.schwartz on 6 Aug 2008 at 1:33

GoogleCodeExporter commented 9 years ago
Bulk update -- these are issues that are looking for either a well-tested
patch, or a sponsor to pay for development.

Original comment by baron.schwartz on 25 Dec 2008 at 4:41

GoogleCodeExporter commented 9 years ago
Changing to Enhancement

Original comment by baron.schwartz on 25 Dec 2008 at 10:06

GoogleCodeExporter commented 9 years ago
MurmurHash2 code integrated in r3259.  Built and tested.

Original comment by randomfoox@gmail.com on 23 Mar 2009 at 5:43

GoogleCodeExporter commented 9 years ago
Great!  I have built, installed and run the function, but not really looked 
over the
code.  (I am far from a C/C++ expert, in fact I am the last person who should 
give an
opinion on it.)  I forwarded a conversation with MurmurHash's author to the 
mailing
list.  I'll also attach it here in case it's useful:

From: Austin Appleby
Date: Thu, Apr 24, 2008 at 11:24 AM
Subject: Re: Hash algorithms

I just put a 64-bit version together a few days ago, haven't posted it
yet. Code is attached - there's one version for 32-bit platforms and
one version for 64-bit platforms. They do not produce the same hashes,
but they're at least as fast as MurmurHash2.

On Thu, Apr 24, 2008 at 7:53 AM, Baron Schwartz wrote:
>
> Austin,
>
> I'm not very good with math.  Have you designed a 64-bit version of
> Murmur Hash?  The only ones I see return an unsigned int.
>
> I'd extend it to 64 bits, but see the above comment about me and math.
>
> Thanks,
> Baron

Original comment by baron.schwartz on 23 Mar 2009 at 1:28

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for posting the 64-bit version.  I was using the 32 bit version and 
implicitly
type casting the result to ulonglong, which is not what we want.  I opted for 
using
the 32 bit (code not output) version of MurmurHash64 as it's the lowest common
denominator.  I had initially thought about including both versions in the 
code, and
in the murmur_hash_init function choosing which one to use based on 
sizeof(long). 
Unfortunately the two versions to not produce the same hash, so running
mk-table-checksum on mixed 32/64 bit databases with the murmur hash would not 
work. 
I built, installed and tested this version on both 32-bit and 64-bit databases 
and
verified that they return the same hash on both platforms.

This change was made in r3273.

Original comment by randomfoox@gmail.com on 25 Mar 2009 at 3:03

GoogleCodeExporter commented 9 years ago
Hm, I doubt many people are running a mixed environment.  And there's real 
benefit (a
lot lower chance of collision) in a 64-bit hash.  We can check back with the 
author
of the code and see if he has anything new.  What about a 64-bit version on 
32-bit
platforms?

With a 32-bit hash we should expect a fair chance of a collision, even if it's a
perfect hash, in less than 100k values.  I've actually seen real-life cases of 
32-bit
hash functions causing collisions, just a couple weeks ago.

Original comment by baron.schwartz on 25 Mar 2009 at 3:11

GoogleCodeExporter commented 9 years ago
Sorry if I wasn't clear.  I updated the code to the version of MurmurHash that
returns a 64-bit value, but the code itself is designed to run on a 32-bit 
machine. 
Austin provided two versions, one that was designed to run on a 32-bit machine, 
and
one that runs well on a 64-bit machine.  Both versions return 64-bit values, 
albeit
not the same value.  The downside to choosing the 32-bit version is that it will
probably not run as fast as it might otherwise on a 64-bit machine.

Original comment by randomfoox@gmail.com on 25 Mar 2009 at 3:26

GoogleCodeExporter commented 9 years ago
Austin Appleby emailed me on April 17th to say this:

"All you folks are have e-mailed me at some point about MurmurHash, so I
thought I'd let you know that I've finally gotten around to publishing the
64-bit versions at http://murmurhash.googlepages.com

There was an early version floating around that was missing one mix step at
the end of MurmurHash64B - the version on the site now is correct and passes
my test suite.

The 64-bit variants are also about 10% faster than MurmurHash2 on large
blocks due to the loop unrolling - might be useful for some applications."

I am going to go ahead and package the code as-is with Maatkit, and if the
improvements Austin mentions get added at some point, that's great.  (I won't 
close
this issue report yet, but I don't want to delay this release.)

Original comment by baron.schwartz on 4 May 2009 at 2:09

GoogleCodeExporter commented 9 years ago

Original comment by dan...@percona.com on 3 Jun 2009 at 7:19

GoogleCodeExporter commented 9 years ago
On the documentation page for mk-table-checksum
(http://www.maatkit.org/doc/mk-table-checksum.html), it says that MURMUR_HASH is
faster then the FNV_64. However I am seeing the opposite. I was curious if 
anyone
else is experiencing the same thing. Tried with MySQL 5.0 and 5.1 on 64bit 
servers.

Original comment by joegrasse on 22 Jun 2009 at 7:49

GoogleCodeExporter commented 9 years ago
I have not benchmarked it myself.  I should have :(  I was just trusting Austin
Appleby's benchmarks.  However, we might also be seeing a slow implementation 
of fast
code, I don't know.

Original comment by baron.schwartz on 1 Jul 2009 at 1:37

GoogleCodeExporter commented 9 years ago
Getting rid of "SponsorIt" priority for a bunch of issues.

Original comment by baron.schwartz on 30 Jan 2010 at 7:46

GoogleCodeExporter commented 9 years ago

Original comment by baron.schwartz on 1 Feb 2010 at 1:37

GoogleCodeExporter commented 9 years ago

Original comment by dan...@percona.com on 23 Mar 2010 at 5:55

GoogleCodeExporter commented 9 years ago
Maybe the documentation should be changed to give preference back to FNV1A_64? 
In my testing, I've found it to be over twice the speed of MURMUR_HASH (and the 
gap seems to widen as chunk sizes increase).

Original comment by simon.k...@gmail.com on 20 Jul 2011 at 6:45