Open GoogleCodeExporter opened 9 years ago
Bulk update -- these are issues that are looking for either a well-tested
patch, or a sponsor to pay for development.
Original comment by baron.schwartz
on 25 Dec 2008 at 4:41
Changing to Enhancement
Original comment by baron.schwartz
on 25 Dec 2008 at 10:06
MurmurHash2 code integrated in r3259. Built and tested.
Original comment by randomfoox@gmail.com
on 23 Mar 2009 at 5:43
Great! I have built, installed and run the function, but not really looked
over the
code. (I am far from a C/C++ expert, in fact I am the last person who should
give an
opinion on it.) I forwarded a conversation with MurmurHash's author to the
mailing
list. I'll also attach it here in case it's useful:
From: Austin Appleby
Date: Thu, Apr 24, 2008 at 11:24 AM
Subject: Re: Hash algorithms
I just put a 64-bit version together a few days ago, haven't posted it
yet. Code is attached - there's one version for 32-bit platforms and
one version for 64-bit platforms. They do not produce the same hashes,
but they're at least as fast as MurmurHash2.
On Thu, Apr 24, 2008 at 7:53 AM, Baron Schwartz wrote:
>
> Austin,
>
> I'm not very good with math. Have you designed a 64-bit version of
> Murmur Hash? The only ones I see return an unsigned int.
>
> I'd extend it to 64 bits, but see the above comment about me and math.
>
> Thanks,
> Baron
Original comment by baron.schwartz
on 23 Mar 2009 at 1:28
Attachments:
Thanks for posting the 64-bit version. I was using the 32 bit version and
implicitly
type casting the result to ulonglong, which is not what we want. I opted for
using
the 32 bit (code not output) version of MurmurHash64 as it's the lowest common
denominator. I had initially thought about including both versions in the
code, and
in the murmur_hash_init function choosing which one to use based on
sizeof(long).
Unfortunately the two versions to not produce the same hash, so running
mk-table-checksum on mixed 32/64 bit databases with the murmur hash would not
work.
I built, installed and tested this version on both 32-bit and 64-bit databases
and
verified that they return the same hash on both platforms.
This change was made in r3273.
Original comment by randomfoox@gmail.com
on 25 Mar 2009 at 3:03
Hm, I doubt many people are running a mixed environment. And there's real
benefit (a
lot lower chance of collision) in a 64-bit hash. We can check back with the
author
of the code and see if he has anything new. What about a 64-bit version on
32-bit
platforms?
With a 32-bit hash we should expect a fair chance of a collision, even if it's a
perfect hash, in less than 100k values. I've actually seen real-life cases of
32-bit
hash functions causing collisions, just a couple weeks ago.
Original comment by baron.schwartz
on 25 Mar 2009 at 3:11
Sorry if I wasn't clear. I updated the code to the version of MurmurHash that
returns a 64-bit value, but the code itself is designed to run on a 32-bit
machine.
Austin provided two versions, one that was designed to run on a 32-bit machine,
and
one that runs well on a 64-bit machine. Both versions return 64-bit values,
albeit
not the same value. The downside to choosing the 32-bit version is that it will
probably not run as fast as it might otherwise on a 64-bit machine.
Original comment by randomfoox@gmail.com
on 25 Mar 2009 at 3:26
Austin Appleby emailed me on April 17th to say this:
"All you folks are have e-mailed me at some point about MurmurHash, so I
thought I'd let you know that I've finally gotten around to publishing the
64-bit versions at http://murmurhash.googlepages.com
There was an early version floating around that was missing one mix step at
the end of MurmurHash64B - the version on the site now is correct and passes
my test suite.
The 64-bit variants are also about 10% faster than MurmurHash2 on large
blocks due to the loop unrolling - might be useful for some applications."
I am going to go ahead and package the code as-is with Maatkit, and if the
improvements Austin mentions get added at some point, that's great. (I won't
close
this issue report yet, but I don't want to delay this release.)
Original comment by baron.schwartz
on 4 May 2009 at 2:09
Original comment by dan...@percona.com
on 3 Jun 2009 at 7:19
On the documentation page for mk-table-checksum
(http://www.maatkit.org/doc/mk-table-checksum.html), it says that MURMUR_HASH is
faster then the FNV_64. However I am seeing the opposite. I was curious if
anyone
else is experiencing the same thing. Tried with MySQL 5.0 and 5.1 on 64bit
servers.
Original comment by joegrasse
on 22 Jun 2009 at 7:49
I have not benchmarked it myself. I should have :( I was just trusting Austin
Appleby's benchmarks. However, we might also be seeing a slow implementation
of fast
code, I don't know.
Original comment by baron.schwartz
on 1 Jul 2009 at 1:37
Getting rid of "SponsorIt" priority for a bunch of issues.
Original comment by baron.schwartz
on 30 Jan 2010 at 7:46
Original comment by baron.schwartz
on 1 Feb 2010 at 1:37
Original comment by dan...@percona.com
on 23 Mar 2010 at 5:55
Maybe the documentation should be changed to give preference back to FNV1A_64?
In my testing, I've found it to be over twice the speed of MURMUR_HASH (and the
gap seems to widen as chunk sizes increase).
Original comment by simon.k...@gmail.com
on 20 Jul 2011 at 6:45
Original issue reported on code.google.com by
baron.schwartz
on 6 Aug 2008 at 1:33