orlp / polymur-hash

The PolymurHash universal hash function.
zlib License
318 stars 7 forks source link

Degree of confidence in proofs #7

Open michaeleisel opened 1 year ago

michaeleisel commented 1 year ago

Hi, I'm looking to use this hash function in practical applications that rely heavily on the hash collision upper bound probabilities, both the one for individual inputs as well as the one for pairwise-independence. However, I'm curious to understand the level of confidence in the proofs. For example, bounds for umash were later corrected, e.g. in f13f07ab938c77a8bcf25961e2595111d77a2fed (as described in https://pvk.ca/Blog/2020/10/31/nearly-double-the-ph-bits-with-one-more-clmul/ ). So, what is the degree of confidence in the proofs? Have they been checked by others?

orlp commented 1 year ago

I've spent quite some time going over the proof myself multiple times, so I am fairly confident it's correct. But I'm not aware of anyone else going over them, or at least they haven't told me (of either issues or lack thereof).

The correction in the bounds of umash you referred to was (I believe) due to not using a fully invertible combination matrix in Nandi's encode-hash-combine paradigm. Jim Apple showed that this can still be fine, but you have to take into account the size of its null space. However, PolymurHash does not use the encode-hash-combine paradigm at all so at least that potential error does not apply to it.

Hi, I'm looking to use this hash function in practical applications that rely heavily on the hash collision upper bound probabilities, both the one for individual inputs as well as the one for pairwise-independence.

I would be curious as to what that application would be.