tfussell / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
Other
1.45k stars 406 forks source link

Improving the hash function #673

Open ThibaultDECO opened 1 year ago

ThibaultDECO commented 1 year ago

Using Szudzik's pairing function (more efficient than the previous approach or Cantor's)

ThibaultDECO commented 1 year ago

As discussed in issue #648.

tfussell commented 1 year ago

I didn't know about that algorithm--it's pretty cool! Could you explain how it improves performance though? It seems like the operations are about the same (bit shift ~= multiplication, addition ~= boolean OR).

tfussell commented 1 year ago

If the pairing function is a perfect hash function, maybe we could remove the call to std::hash<size_t> entirely and simply return the 1-d value as the result of the custom hash. I could see that being much faster.

ThibaultDECO commented 1 year ago

The Szudzik function has 100% value packing efficiency, i.e. no overflow. With Szudzik, pairing two 32-bits numbers results in a 64-bits number, which is not the case for Cantor or the current implementation. For example: cantor(9, 9) = 200 szudzik(9, 9) = 99 You will find more details and a benchmark on performance here. We could indeed remove the call to std::hash, but only if size_t is 64-bits, which is unfortunately not always the case.