Digest algorithm - Githubissues

mnot commented 8 years ago

A few people have noted that SHA-1 is probably overkill.

mnot commented 8 years ago

EKR points out: http://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

kazuho commented 8 years ago

Considering the fact that N and P are both defined to be 5 bits, it would be preferable to use a hash function that emits a 62-bit or longer value.

mnot commented 8 years ago

xxHash seems good, but see https://github.com/Cyan4973/xxHash/issues/49.

Likewise, the other ones mentioned there are also defined by their source.

I think our options are:

Specify one of these as a normative reference and hope we can get that through the process (not likely)
Write the spec for one of these (ouch)
Wait for someone else to write the spec for one of these (may be some time)
Continue using a specified algorithm.

I'm kind of leaning towards (4), possibly leaving extensibility for other hash algorithms (although, as discussed before, it's not clear how we'd negotiate for that; maybe #22 could help).

Thoughts?

martinthomson commented 8 years ago

Don't know what the relevance of #22 is.

I would say that this seems like a good basis for 2: https://github.com/aappleby/smhasher/wiki/MurmurHash3 These are necessarily very simple programs to specify. A draft shouldn't be hard to write if that's necessary. Of course, I'm lazy just like you.

mnot commented 8 years ago

Sorry, #229.

I take it you're volunteering, then...

martinthomson commented 8 years ago

"Internet-Draft seeks editor, must like long walks on the beach, sunsets and have poor impulse control."

martinthomson commented 8 years ago

Fat fingers, bah.

kazuho commented 8 years ago

My tendency goes to 4.

If we are to actually choose a simpler algorithm than SHA, I think the choice should better be made based on the popularity and the quality of the hash function.

And for that respect, SipHash 2-4 might be a good candidate. It is known to be fairly fast, and is used as a basis by many hash table implementations (thanks to its resistance to collision).

I am not sure how collisions can be used to attack Cache Digests, but it might still be a good idea to have resistance considering the fact that an attacker can insert a digest (by controlling the victim's browser to access a specific resource on the target's website (by using IMG links, etc.).

martinthomson commented 8 years ago

Any attack on cache digests is likely to be an advance on cache-timing attacks where the attacker tries to learn whether a particular resource is in the cache.

Say that we have a secret in a URI path and the attacker wants to learn that secret. If the resource is ordinarily pushed alongside a parent document AND neither the parent document nor the secret resource are cacheable you have a way to exploit things AND the 404 page on the site IS cacheable then you have a situation that can be exploited.

The attacker includes the parent document in an iframe and measures the time to load the document when the secret is pushed. The attacker then iteratively tests different URIs by forcing a load of a non-existent, but potentially colliding resource then reloading the iframe. A collision will be apparent when the iframe load takes significantly longer (one round trip). At that point, the attacker need only generate hash pre-images (since none of the fast hashes claims to have pre-image resistance, this isn't that difficult given sufficient constraints on the shape of the secret) until it finds the secret.

This has a false positive rate that would have to be corrected by a more rigorous test on potential candidates, but it seems like it could be feasible given enough time. Everyone give thanks to @w3c/hr-time for making this possible!

Similarly, if the pushed resource is cached, then the attacker might find a way for the server to server and conditionally push it resources with a URI component that is attacker controlled. That would enable a faster attack too.

This attack might be used to recover whose twitter feed someone has viewed. Even though user images are bucketed in size to resist traffic analysis, this sort of attack relies on a different side channel that I doubt has been properly safeguarded.

This is rendered less effective by choosing a hash with pre-image resistance. It's still relatively easy to engineer a collision (particularly since cache digests are necessarily very narrow), but the hash can't be reversed to find the URL. It could still be used as a confirmation step. While the space of potential twitter image URLs is obscenely large, the set accounts that are interesting to an attacker might be easily enumerable (if not all accounts if it comes to certain resourceful attackers).

sebdeckers commented 7 years ago

When using small values, 50-100 bytes of a typical URL (w/o origin), I do not see significant performance difference between sha256 and xxhash64 using pure JavaScript implementations in the browser. YMMV.

Code & results: https://gitlab.com/sebdeckers/fingerprinting-benchmarks

mnot commented 7 years ago

That's very helpful, thanks. Given that, I think we can close this issue with no action (reopening if we learn something new, of course).

httpwg / http-extensions

Digest algorithm #228