Open guotie opened 11 years ago
I have done a test of this library, reusee/mmh3, a modified reusee/mmh3 to use unsafe tricks, and the python mmh3. I've only done 128bit as that's all I'm interested in. On a corpus of 1m strings with an approx average length of ~300b:
lib | timing |
---|---|
python-mmh3 | 240ms (hash_bytes : 290ms) |
spaolacci/murmur3 | 376ms |
reusee/mmh3 | 3842ms (3.8s) |
mmh3/custom | 365ms |
I'd imagine that for C/C++ without the Python overhead, you might be able to cut that python bench by at least 25%. This library is the only Go one that uses the golang hash interface and supports streaming, so I'd say use this; there's not a ton of overhead left to remove.
I just wanted to say that I looked at all the murmur3 implementations in Go and IMHO this one is the best. As mentioned in the ReadMe and above it supports the standard Go hash interface, performance is excellent, and it is a great example of how to implement the 3 versions in a very go-centric way. It also has a BSD license, which was another plus for me.
Good news. Rsc start working on it https://github.com/golang/go/issues/8037
Well, it's difficult to say without a proper equivalent C++ benchmark, but I would say not so bad.
h1 = h1*5 + 0xe6546b64
statement suffers from theIMUL 5, ADD 0xe6546b64
translation instead of a more optimizedLEA
that gcc/llvm would output (but it certainly doesn't make a 2 times slower).All in all, except possibly for very small inputs where any preparation costs show-up easily, I suspect the performance to be "fine enough" for most the bottleneck to be elsewhere. I would happily investigate, though, if you have any numbers (or situations) you find discouraging.