Enumeration and interception vulnerablity?

gmaxwell commented 3 years ago

Unless I misunderstand the protocol this system is open to enumeration and interception attacks:

An attacker monitors the DHT and cracks topics as he learns of them by taking the applicable timestamp and guessing strings and looks for matches. A single older model GPU can attempt 1.3 billion blake2 hashes per second, so even pretty improbable topics will succumb.

The attacker can then take topics found by the above cracking plus existing dictionaries and then expand them forward for further timestamps.

As soon as someone uses a topic in his predicted topic database the attacker can intercept the connection. Additionally, because the topic is also the passphrase for the key the attacker will also be able to decode the data. I believe that they could successfully MITM connections this way.

Even if their active interceptions aren't successful, they can go back and track the usage of topics over time.

This has some similarities to the old tor v2 hidden service enumeration vulnerabilities, but in the case of tor enumerating a hidden service didn't let the attacker intercept the traffic.

To harden the protocol against this, I would recommend:

(1) Provide a separate topic and password. Compare the password using a zero-knowledge password proof once you are talking to the other side. This will prevent attackers from guessing weak passwords. (Users might not set a password or set it to something really dumb like the topic, but they'll be no worse off than the current state of affairs.)

(2) Use a computationally hard hash function for the topic (and password). Especially because the passwords are unsalted (an attacker predictable value shared by all users is not really a salt). Argon2 or scrypt would be a normal recommendation but really anything that won't let an attacker try a billion attempts per second with a single GPU would be a good move.

(3) Have the introduction process use a shortened hash. E.g. generate the topic-id by Hash(application||timestamp||first_16_bits(Argon2(timestamp||topic))). Then find all the participants using that value and for each challenge them with a zero-knowledge password proof for the complete timestamp/topic/password, connect to the first that is successful. How easy this is to implement depends on how difficult hyperswarm makes it to try out multiple matches. The idea is that the shortening the password hash means an attacker would have many false positives and be able to identify the topic... but the real users will have few collisions that they need to talk to before successfully connecting. The 16-bits in my example could be adjusted to trade off information leak vs time wasted in collisions.

(4) Use a random beacon instead of a timestamp. The nist random beacon or recent bitcoin block hashes would be possible candidates. The improvement here would be somewhat minor: it would prevent an attacker from precomputing future topic hashes well in advance. If (3) is done this wouldn't be particularly important and probably wouldn't be worth the trouble.

Failing the above (and maybe with the above) users should be strongly encouraged to use topics with cryptographic security. E.g. perhaps automatically generate a 256-bit random topic and tell the user rather than letting them provide it. Otherwise this system may provide negligible security in practice-- which isn't especially good for something that claims to be encrypted.

mafintosh commented 3 years ago

Thanks for the write up! All of those sounds like good additions. 1) is added by #5 and 2) and 3) should be pretty easy to do, will fix later this week.

pfrazee commented 3 years ago

@gmaxwell Appreciate the really detailed writeup and suggestions

pfrazee commented 2 years ago

After some discussion, we decided to simplify this tool (since it's not our primary focus) and use a 32 byte passphrase without timegating https://github.com/mafintosh/hyperbeam/pull/12

ninpnin commented 1 year ago

Would it be possible to encode the 32 byte passphrase as three human-readable words similarly to the old version? That shouldn't have any security implications, but would improve the UX.

holepunchto / hyperbeam

Enumeration and interception vulnerablity? #6