sourcenetwork / defradb

DefraDB is a Peer-to-Peer Edge Database. It's the core data storage system for the Source Network Ecosystem, built with IPLD, LibP2P, CRDTs, and Semantic open web properties.
420 stars 41 forks source link

DocKey UUIDv5/SHA1 refactor #862

Closed jsimnz closed 1 year ago

jsimnz commented 2 years ago

Currently our DocKeys are UUIDv5, which effectively takes some arbitrary input, and applies a SHA1 hash and inserting that into the UUID data field.

This was implemented originally because UUIDv5 was the only spec compliant way to fit our CIDs into a UUID format. But there are many issues in using SHA1, as it is effectively considered a broken hash (susceptible to chosen text attacks).

Ideally we would use a stronger hash function, or the CID/Multihash data directly but the UUID spec didn't have a compliant way to achieve this.

But, theres a new draft spec for the next iteration of UUIDs (v6, v7, and v8). The one of interest is the UUIDv8, as its a blank/custom UUID version that soley maintains format and backwards compatability.

This allows us to fill the 122 bits with anything we want.

Theres a few options we can use. Our requirements are that they are A) Cryptographically secure B) Can fit within 122 (either direct hash output, or with truncation).

Suggestions:

jsimnz commented 2 years ago

Note: should target ~100 bits of security

orpheuslummis commented 2 years ago

https://keccak.team/kangarootwelve.html mentionned as interesting hash function choice

orpheuslummis commented 2 years ago

https://www.poseidon-hash.info

jsimnz commented 2 years ago

https://www.poseidon-hash.info

Poseidon is of a different classification of hashfunctions. Its what I would call "computational" instead of "algorithmic". Designed for use in ZK circuits, that are only effecient at certain kinds of "computations". Compared to "algorithmic" like SHA/keccak/BLAKE/etc that use a series of convoluted "steps" to create a output hash.

jsimnz commented 2 years ago

Talking it over with Bruno, we realized that our use of SHA1 may be safe, as its "protected" SHA256 as the input.

Traditional attacks on SHA1 the attacked has freedom on the input, but we use the CID of the genesis content (SHA256) as the input to the SHA1 function.

✅ SHA1(SHA256(input)) ❌ SHA1(input)

We still suffer from the collision resistance (security) of only 80bits, but at least at least that is a probabalistic attack, rather than a direct attack.

Should continue to look into options, but we may be safe in the short term.

fredcarle commented 1 year ago

From discussion between John and Bruno, decision has been made that this is not high priority at this point and the issue can be closed.