Closed ikrestov closed 6 years ago
Thanks @relic-!
I will take a closer look at your suggestion tomorrow. Hopefully dropping march won't cause a big performance impact. If possible, it would be great if you could test with mtune=generic, but no worries if you can't.
From my observations, difference is insignificant. You get a couple of more instructions in the binary, but they are not critical for performance. As far as I understand, metrohash's performance is the result of SSE2, the rest of latest compiler/CPU optimisations don't bring massive impact.
Here are results of the benchmark I found online, with no march, march=native, mtune=generic. I did 5 runs each and picked roughly the best.
16 bytes
2 tests completed.
metroHash-64 x 8,380,672 ops/sec ±7.34% (80 runs sampled)
metroHash-128 x 8,215,893 ops/sec ±1.96% (94 runs sampled)
Fastest is: metroHash-128
Slowest is: metroHash-64
64 bytes
2 tests completed.
metroHash-64 x 9,027,638 ops/sec ±2.13% (89 runs sampled)
metroHash-128 x 8,076,815 ops/sec ±0.49% (97 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
256 bytes
2 tests completed.
metroHash-64 x 8,360,351 ops/sec ±0.19% (96 runs sampled)
metroHash-128 x 7,261,436 ops/sec ±0.64% (95 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
Another run
16 bytes
2 tests completed.
metroHash-64 x 9,246,145 ops/sec ±2.34% (92 runs sampled)
metroHash-128 x 8,359,930 ops/sec ±0.28% (96 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
64 bytes
2 tests completed.
metroHash-64 x 9,060,536 ops/sec ±0.36% (96 runs sampled)
metroHash-128 x 7,857,510 ops/sec ±1.75% (94 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
256 bytes
2 tests completed.
metroHash-64 x 7,964,461 ops/sec ±2.07% (93 runs sampled)
metroHash-128 x 7,018,770 ops/sec ±2.79% (92 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
16 bytes
2 tests completed.
metroHash-64 x 9,405,594 ops/sec ±0.45% (93 runs sampled)
metroHash-128 x 8,259,543 ops/sec ±0.67% (97 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
64 bytes
2 tests completed.
metroHash-64 x 8,920,569 ops/sec ±0.25% (98 runs sampled)
metroHash-128 x 7,812,969 ops/sec ±0.26% (98 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
256 bytes
2 tests completed.
metroHash-64 x 8,119,583 ops/sec ±0.27% (98 runs sampled)
metroHash-128 x 7,102,565 ops/sec ±1.56% (94 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
16 bytes
2 tests completed.
metroHash-64 x 9,599,855 ops/sec ±0.30% (94 runs sampled)
metroHash-128 x 8,516,265 ops/sec ±0.59% (98 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
64 bytes
2 tests completed.
metroHash-64 x 9,107,393 ops/sec ±0.33% (98 runs sampled)
metroHash-128 x 7,898,617 ops/sec ±0.25% (96 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
256 bytes
2 tests completed.
metroHash-64 x 8,446,693 ops/sec ±0.24% (96 runs sampled)
metroHash-128 x 7,420,537 ops/sec ±0.77% (97 runs sampled)
Fastest is: metroHash-64
Slowest is: metroHash-128
I just published metrohash@2.5.0
We have kubernetes cluster running in the cloud, and we were caught out by "illegal instruction" crash, when a new node with older CPU arch was added to the cluster.
The library was compiled on a machine with haswell architecture, and crashed on sandybridge. Both support sse2 and after a bit of gdb and comparing assembler, it turned out that gcc on haswell would output BMI2 instructions, and sandybridge does not support those.
I would suggest to drop march from default cflags or replace it with -mtune=generic.
I have not tested mtune=generic, but dropping march resolved our crash issue.