rurban / smhasher

Hash function quality and speed tests
https://rurban.github.io/smhasher/
Other
1.84k stars 177 forks source link

insanely fast madhash #98

Closed wangyi-fudan closed 4 years ago

wangyi-fudan commented 4 years ago

I would like to introduce the madhash (included in wyhash.h) . It samples up to 32 byte of the data (four quartile position) to produce a 64-bit hash. It breaks every rule, however, it is valid to short string hashing up to 32 byte. for longer strings, it is bad but probabaly works for hash table. The bulk speed is insane...

Benchmarking /usr/share/dict/words HashFunction Words Hashmap Bulk64K Bulk16M madhash 303.92 50.70 11295.11 2863311.53 std::hash 96.09 36.57 7.34 7.36 wyhash_v5 265.71 45.59 26.35 21.78 xxHash64 110.84 35.85 14.71 14.58 XXH3_scalar 183.82 42.64 13.09 13.05 t1ha2_atonce 127.93 36.26 16.60 16.32

@Sanmayce

rurban commented 4 years ago

FWIW: renamed to FastestHash now. Very fast, very poor and unseeded.

wangyi-fudan commented 4 years ago

today's improment: we take the head 4 bytes, tail 4 bytes and middle for 4 bytes: hash=(head+tail)*middle

HashFunction Words Hashmap Bulk64K Bulk16M FastestHash 734.83 52.66 3616.05 1908874.35 std::hash 96.74 35.40 7.37 7.36 wyhash 251.45 44.88 21.61 20.04 xxHash64 109.07 35.53 14.71 14.62 XXH3_scalar 180.63 42.29 13.11 13.12 t1ha2_atonce 126.61 36.36 17.10 16.76

rurban commented 4 years ago

Dont use *, use ^ instead. And add a tiny seed somewhere. Bit it's still too bad to survive the Diff test.