scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Use xxh3 instead of murmurhash3 for better performance of hashing #11738

Closed plokhotnyuk closed 4 years ago

plokhotnyuk commented 5 years ago

Link to the xxHash release which suports xxh3: https://github.com/Cyan4973/xxHash/releases/tag/v0.7.0

Link to the current implementation of Mumur hashing in Scala: https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/hashing/MurmurHash3.scala#L181-L201

Ichoran commented 5 years ago

The problem is that xxHash on the JVM isn't, in the implementations I've tested, faster than MurmurHash3. I haven't explored MetroHash, but it's 64 bits (at a minimum) anyway, and it's CPU-specific; I'm not sure how well it performs on alternate architectures (e.g. AMD).

If someone wanted to do the work of benchmarking the latest hash variants and got promising results we could switch again, but last time we did this MurmurHash2 (actually--3 wasn't out yet) was the best available, all things considered. (Including better than xxHash at the time.)

leventov commented 4 years ago

My testing in 2016 showed that xxHash was much faster than MurmurHash3 on the JVM.

Now Zero-allocation Hashing includes implementations for murmur3, xx, metroHash, and new algorithm wyHash: somebody could compare them (I didn't benchmark the latest two).

Note that according to the latest version of SMHasher, MurmurHash3, xxHash, and MetroHash have some quality problems. Only wyHash and FarmHash don't, among the algorithms implemented in Zero-allocation Hashing.

wangyi-fudan commented 4 years ago

Zero-allocation Hashing 's wyhash implementation use 64X64 mul which is slow on 32 bit machine. Please note that WYHASH32 define enables fast 32 bit code. Test and choose a suitable one by yourself.

lrytz commented 4 years ago

While this is a relevant and important topic, scala/bug is not the right place to discuss it. Feel free to open a thread on contributors.