Open rurban opened 7 years ago
Thanks for being vocal about the "myths"! OTOH, I think that it'd be more constructive to balance out the sensationalism by mentioning the fact that MANY (most?) practical hash tables have restrictions on key size (e.g. 32 bytes) AND encoding (e.g. ascii), such that crypto/stronger hash functions like SipHash are indeed PRACTICALLY flood resistant (not really proof) to known attacks? The same cannot be said for most "weaker" hash functions.
SipHash has nothing to do with flood resistance. It's only about seed hiding, nothing else. Flood resistance can ONLY be done by a different collision resolution scheme.
Am 03.01.2017 9:54 vorm. schrieb "vicaya" notifications@github.com:
Thanks for being vocal about the "myths"! OTOH, I think that it'd be more constructive to balance out the sensationalism by mentioning the fact that MANY (most?) practical hash tables have restrictions on key size (e.g. 32 bytes) AND encoding (e.g. ascii), such that crypto/stronger hash functions like SipHash are indeed PRACTICALLY flood resistant (not really proof) to known attacks? The same cannot be said for most "weaker" hash functions.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rurban/smhasher/issues/20#issuecomment-270071310, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjUd0WFXo29wwF1c6yUb-pf-36pKCHks5rOgyzgaJpZM4K-Hxe .
"Flood resistance" is indeed not a precise term. Stronger/crypto hash is all about seed/secret/salt hiding, which, along with further restrictions (valid key size and encoding) dramatically increase the friction of seed discovery and collision generation, hence "resistance" (rather than proof) in a general sense.
I still think that there is a great misunderstanding on your side. Can you describe plan of such attack? flood resistance idea suppose that you send keys to another network node. So:
1) even if you run two network processes on the same cpu, network I/O time (millions of msgs per second) is higher than hash computation time. So, no need to measure time of hash calculation, instead measure number of msgs you can send via particular network.
2) hash tables don't have fixed size, thaty are extended to accomodate more entries. so, if you send million keys to the attacked node, you will definitely have a lot of 10-bit collisions, but the hash table will probably contain a million entries too.
3) that is you plan to searching collisions? let's imagine that application developers are so kind that they give you a full collision list for the current table contents. so you send, at average, 32 keys and found 1 collision, then you deleted all non-collisioned keys and send, at average, another 1024 keys to find one more key resolving to the same hash entry, remove all extra keys and repeat that again. As you see, if hash table has 1024 entries, you need to send N*1024 keys to build a single entry with N colliding keys. It will work with any hash algorithm, so hash algos themself can't be judged on this critery. It's still true that good network application should be developed seriously and hash algo by itslef (even SHA) doesn't ensurу security.
4) that algos like siphash is try to ensure is that, without knowing seed, you can request any reasonable amount of key->value mappings and still be unable to construct new keys mapping to values you choose (including subsets of value bits) with probablility larger than 2^-N, where N is width of value/subvalue.
5) Secure hash algos like SHA-1/2/3 provides the same guarantees, even without seeding. And you can see that despite huge practical value, bitcoins are still mined either with brute-force approaches or at least with speed close to brute-forcing. So, no doubt that we can't yet generate 40+ bit SHA2 collisions much faster than brute-forcing
6) For usual hashes, we know that collisions are easy to generate for xxhash32 and murmurhash3a. i haven't seen any work generating collisions w/o knowing seed with higher probability than 2^-N - for any other algorithm, including siphash, spookyhash and so on. So you need to decide - if you talk about attacks mathematically possible against any hash including SHA family, there is no way but carefully design network apps to deal with such attacks. If you say that you can find collisions with better than mathematically possible probability for any hash but xxh32/murmur3a, than we are all ears. Mixing those two types of attacks is just a way to muddy the waters
Maybe read also http://perl11.org/blog/seed.html and see my offline test https://github.com/perl11/cperl/blob/master/t/op/hashflood.t and https://github.com/perl11/cperl/blob/master/t/op/seed-siphash-8-0.dat (and other collision files for more hash funcs) created in 0.003s per seed.
I won't disclose much further how to get at a seed practically, as most dynamic languages are affected who rely on the false siphash security claims.
ad 2. you don't need to send million keys, just as many to be able time the differences to identify collisions. 8bit should be enough locally (255 keys) with brute-force of 0.003s for siphash. externally you might need 16bit (65535 keys) with brute-force time of 1m30s (fast) vs 2m (siphash).
ad 5. even SHA-1 is insecure and statistically bad when used in a 32bit hash table, as seen in the smhasher tests. I won't trust a crippled SHA-2 nor SHA-3 neither, but haven't tested it yet.
ad 6. collisions are trivial to generate for every single hash function used in hash tables. if siphash, sha-1 or more. just brute force it for the usual hash table schemes (will not work with better ones using primes). siphash just needs twice as long, as the primary factor in siphash is mixing the seed into the box and slowness.
just getting the siphash seed is not as trivial as for most other hash funcs. watch the relevant CCC talk https://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html (against trivial DJBX33A/DJBX33X style hashes) and https://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html (siphash), where most hash funcs were broken and the secure hash function myth started.
This number depends on the measureable timing of collisions:
BITS=10: N=1023 colliding buckets, to expose timings in linear search
- O(n/2) vs O(1)
Search in a linked list of 16383 should be enough to DoS a server (created in 7s)
but creating big lists need only 2-4min, independent on the keys.
Typical POST size 1-4MB enough, PHP limits to max 1000 keys (i.e. 11 bits)
time to create is trivial brute-force, not with a solver yet.
Here with the slowest hash: siphash. others are almost twice as fast.
BITS N time (siphash) fnv1a
8 - 255 0.003s
9 - 511 0.006s
10 - 1023 0.033s
11 - 2047 0.12s
12 - 4095 0.45s
13 - 8191 1.82s
14 - 16383 7.6s
15 - 32767 31s
16 - 65535 2m2s 1m30s
18 - 262143 4m3s 23770 2m38s found 49446
20 - 1048575 3m44s 2m37s
9m42 16135
3m57 6367
24 - 16777215 3m55s (heap-array, stack too small)
28 - 268435455 3m57s 2m42s
31 - 2147483647 2m40s
Did you notice one of the "take home messages" of the CCC talk linked:
Randomize your hash functions!
(Also, the hash function attacked in that article, DJBX33X, is far from a secure hash function.)
I don't know about any of the others, but this is exactly what Rust does, with a new key for every hash table. All the talk of secure hash functions being "security theatre" assumes the key is known or deducible. This article is even worse and could almost be summarized as: given the ability to run arbitrary code on a server [plus some other requirements] we can run a CPU DoS attack — well, this is pretty easy then, isn't it?
If anything the advice should be use a secure hash function with a secure random key for every table, and never dump keys to log files or when serializing data. Again, I can't talk for any other language, but this is exactly what Rust does.
No, exactly the opposite. Rust falls into the same trap as most other languages. Security against hash DoS comes from fixing the hash DoS, not by obscuring the key. The other measures, like randomizing the key, randomizing the iteration order are just addons on this.
Granted, using tree-based-maps with O(n×log(n)) complexity, or some hybrid with O(n×log(n)) worst case, is an adequate fix. But so is making sure no attacker has adequate information to predict which inputs will cause hash collisions. Even if side-channel timing attacks can be used to identify collisions (difficult over the open internet, because there are so many other delays), with a secure hash function and secret key it will still be impossible to predict new values which cause collisions.
As I outlined above, no. There are practical remote timing attacks and there is too much independent public information on a hash table structure to bisect a key. With a dynamic language even worse, there's it's trivial. The hiding game for hashes is theatre.
use a secure hash function with a secure random key for every table, and never dump keys to log files or when serializing data. Again, I can't talk for any other language, but this is exactly what Rust does.
it's what every language does today. Reni's point-of-view is unpopular among language developers.
There's still the myth of secure hash table functions around, esp. from the SipHash folks, as you can see at https://github.com/google/highwayhash/issues/28 So add tests to prove them otherwise.
Allow all keys of char 0-255, i.e. including null and control chars. hash functions take binary input, even if some applications reject those keys. It is also a good metric for the hash function algorithmic complexity.