stacks-network / gaia

A decentralized high-performance storage system
MIT License
762 stars 149 forks source link

Use of outdated hashing algorithm #462

Open ghost opened 1 year ago

ghost commented 1 year ago

MD5 is in use in hub/src/server/drivers/diskDriver.ts . MD5 has "collisions" - two separate strings can resolve to the same hash. A good breakdown is https://www.mscs.dal.ca/~selinger/md5collision/ . Node.js also recommends avoiding MD5 - https://nodejs.org/api/crypto.html#support-for-weak-or-compromised-algorithms

Solution: in line 241, replace:

const hash = crypto.createHash('md5')

with a more collision-resistant algorithm such as:

const hash = crypto.createHash('sha512')
jcnelson commented 1 year ago

I'm well aware that MD5 is broken.

This code path uses the MD5 hash function to create an ETag HTTP header. Strictly speaking, it does not need to be an MD5, but the consequences of a collision are already priced into Gaia's data availability guarantees. If the Gaia client is not fetching data directly from the Gaia hub over SSL (in which case, the user already trusts the Gaia hub to faithfully generate ETags), then the only way I can see an MD5 collision bringing harm to the user is if a malicious MITM service (e.g. an evil CDN) deliberately generated the collision in order to serve stale data to clients, while also persuading them that the data had not been modified. Of course, if this is the goal of the malicious CDN, they don't need to break the ETag hash; they could simply lie.

Either way, changing this to sha512 should be inconsequential. But, I just wanted to make sure you're aware that we did consider the fact that MD5 is not a high-quality cryptographic hash. We are not using it as such.