vnmakarov / mum-hash

Hashing functions and PRNGs based on them
145 stars 12 forks source link

Why this numbers are called 'primes' ? #1

Open funny-falcon opened 8 years ago

funny-falcon commented 8 years ago

https://github.com/vnmakarov/mum-hash/blob/master/mum.h#L312-L320

There is no any guarantee, than _mum_next_factor will return prime number. Neither it is guaranteed to be odd (but it looks like algorithm doesn't depends on it). If generator behind rand is a good random generator, then _mem_next_factor even allowed to return zero.

funny-falcon commented 8 years ago

Currently, it does not provide resistance from SIC ("seed independent collisions", or "hash flood"), if default _mum_primes are used.

Seed has no impact on result of neighbor input scrambling https://github.com/vnmakarov/mum-hash/blob/master/mum.h#L209 , so it is possible to choose next block that will cancel diff from previous block. Yes, it is cpu consuming operation, but it should be done only once for known _mum_primes. Also, since lo(x*C) + hi(x*C) is not bijective (1-to-1) function, even one-block collision could be found.

(and, it also leads to conclusion: while MUM is (probably) resistant to first preimage attack, it is certainly not resistant to second preimage attack (for known _mum_primes). Worse than that, this attack doesn't depends on seed, only on _mum_primes).

funny-falcon commented 8 years ago

As an example, there is fallback hash functions for Golang: https://github.com/golang/go/blob/master/src/runtime/hash32.go https://github.com/golang/go/blob/master/src/runtime/hash64.go

They looks to be safer against SIC/"hash flood", although multipliers are constants. And they have impressive performance.

vnmakarov commented 8 years ago

Thank you for your feedback, Yura.

The initial constants are prime numbers. I started with them thinking about prime number factorization as a hard problem (although it is not crypto-level hard for only 64-bit numbers). The code changing them was added later but the names were not changed. The names of the multiplication constants are misleading. I'll modify the names.

Actually I tested mum hash on smhasher with different multiplications constants (including randomly generated). So you are right the number primeness is not important. You are right that randomly generated numbers can be even or zero. I should add the code preventing this.

As for hash flood. I wrote the function for my purposes (an interpreter). I am going to use it the following way: get random constant at the interpreter work beginning and then use the function. I don't think it is even necessary to add a code in hash tables to recognize a denial attack.

Also I don't pretend that MUM hash function is a crypto-level hash function. lo(x*C) + hi (x*C) = a probably can be found on modern equipment but it still will require a lot of time. It would be interesting for me, how much time you will need to find X for C=0X258c4af23da09e33 and a=0X11d934b3c798d46d

Thanks for adding references for hash functions in GO. I'll investigate them. I am getting new references frequently after releasing the code. There are too many hash functions to try. But I tried yours too.

funny-falcon commented 8 years ago

But I tried yours too.

:-) thanks. Mine are not fast, though I believe they are safe. Go's are really fast and look to be safe.

funny-falcon commented 8 years ago

And hash table does not need code for recognition of "dos", if hash function is resistant to "hash-flood" (seed independent collisions). No one will try to attack such function, cause it is waste of time.