Hardened CIDs/Peer-IDs with Argon2 parity section

RubenKelevra commented 4 years ago

This is a feature request for an optional tailing parity section on CIDs as well as Peer-IDs.

The issue

Since humans are particularly bad at comparing long strings of signs (like hash sums), they tend to stick to compare just some sections and call it a day.

Concept

To harden the CIDs/Peer-IDs against address spoofing (partial hash collision), the hash (and the hash-definition as salt) will be used as input for the Argon2 algorithm.

A visual separator like a dash, between the CID and the parity section will help to break it visually apart.

Prospective benefits

Errors in user input can be caught immediately. IPFS won't accept any wrong input and can print helpful error messages, instead of "timeout, content not found" or adding the wrong Peer-ID - searching for a non-existing peer.
Attacks on domain names with forged IPNS-sums can be much more easily identified by the naked eye.
Increases the computing power necessary to do partial hash collision attacks, since the parity section needs to be forged as well.

Originally posted https://github.com/ipfs/go-ipfs/issues/7357#issuecomment-633687330

RubenKelevra commented 4 years ago

@lidel wrote in https://github.com/ipfs/go-ipfs/issues/7357#issuecomment-633731738

@RubenKelevra the limitation here is that IPFS uses CIDv0/CIDv1 spec, and adding checksum would mean creating CIDv2. I think its worth discussing, but as you noticed its out of scope here, so please fill an issue in https://github.com/multiformats/cid

It's just beneficial to use it on the User-Input/Output side, not if computers communicate on secured channels. So I don't know if we really need a new specification since it won't technically alter the CID in any way, just add an optional tailing section, just the representation to the user will be altered.

So instead of bafkqagttovtgm2ldnfsw45dmpeqgy33om4qhaylznrxwczak the user would get an output like bafkqagttovtgm2ldnfsw45dmpeqgy33om4qhaylznrxwczak-ab

Stebalien commented 4 years ago

Parity should probably be achieved through an error-correcting multibase encoding.
- Error correction could be added to any CID without having to change the hash function.
- Error correction would only take up space where necessary (i.e., in text).
Unfortunately, any effective increase to CID security against visual collisions would directly impact IPFS's performance. IPFS relies on being able to quickly hash content. To meaningfully improve security, hashing performance would need to be reduced by 100-1000x.

Stebalien commented 4 years ago

Would you like to try implementing an error correcting (or at least detecting) base encoding for multibase? Bitcoin implemented https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2014-February/004402.html.

bertrandfalguiere commented 4 years ago

About visually avoid spoofing by humans, what about encoding in base... 256? Meaning visually representing CIDs and peerIDs as emojis, like so? A bit provocative, but it could give each CID a striking visual signature. Terminals in Windows, MacOS and Linux support them by default, I think.

It would still be able to support error detection.

I know it can seem a bit childish, and I guess it could decrease performance. But some GUI such as IPFS Desktop could really benefit from it.

Be honest: do I need to consult a therapist? 🤔👨‍⚕️

ribasushi commented 4 years ago

what about encoding in base... 256? Meaning visually representing CIDs and peerIDs as emojis,

/cc @boreq

RubenKelevra commented 4 years ago

@Stebalien wrote

Parity should probably be achieved through an error-correcting multibase encoding.

Error correction could be added to any CID without having to change the hash function.

True. But I don't think it's necessary to change the multi-base-encoding at all. Since it's only for the representation to humans, it would be just calculated when necessary and tailed to the base32/36 string.

@Stebalien wrote

Unfortunately, any effective increase to CID security against visual collisions would directly impact IPFS's performance. IPFS relies on being able to quickly hash content. To meaningfully improve security, hashing performance would need to be reduced by 100-1000x.

It won't really impact performance. The current hashing should be used as-is. Only CIDs, like for the "Copy CID"/"Copy URL" operation would be affected, and the hashing with argon2 would only process the CID itself - not any data behind it.

On cli it could be an optional flag, to print the CIDs with the checksum, like on ipfs files stat --hash /path.

It's very similar to the two check digits in the IBAN. The difference is the complexity I like to add to the method with the use of Argon2.

Only if the complexity and memory usage is high, we can use the check characters as a method to avoid partial hash collision attacks with it.

@bertrandfalguiere wrote:

About visually avoid spoofing by humans, what about encoding in base... 256? Meaning visually representing CIDs and peerIDs as emojis, like so? A bit provocative, but it could give each CID a striking visual signature. Terminals in Windows, MacOS and Linux support them by default, I think.

It would still be able to support error detection.

I know it can seem a bit childish, and I guess it could decrease performance. But some GUI such as IPFS Desktop could really benefit from it.

Be honest: do I need to consult a therapist?

Master Password uses the same method to give feedback that your master password was correctly typed. Since it has no other method to show a user feedback on the password (all passwords will be accepted).

Unfortunately it won't work with URLs and Domain names. So ipfs://bafkqagttovtgm2ldnfsw45dmpeqgy33om4qhaylznrxwczak-ab won't work.

Also not all systems will render visually very identifiable emojis...

Screenshot_20200526_163246

Additionally, we face the same issue: There are many similar emojis, much more than similar characters. So it might actually degrade the verifiability by humans if we just encode the CID/Peer-ID.

Stebalien commented 4 years ago

It won't really impact performance. The current hashing should be used as-is. Only CIDs, like for the "Copy CID"/"Copy URL" operation would be affected, and the hashing with argon2 would only process the CID itself - not any data behind it.

I see. Yeah, that could work. But we'd need more than 2 numerals of parity (2 base32 numerals = 1024 choices). Even then, an attacker is often willing to spend 10k times the resources a user is willing to spend. Users will need to apply this trick to all CIDs they produce while an attacker just needs to compromise one target CID.

Unfortunately, a heuristic here is worse than no heuristic as users will start relying on it.

True. But I don't think it's necessary to change the multi-base-encoding at all. Since it's only for the representation to humans, it would be just calculated when necessary and tailed to the base32/36 string.

We could just tack it on assuming the base encoding doesn't allow for -, but that would change the CID spec and would be significantly more complicated to implement.

ipfs / kubo