multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
336 stars 201 forks source link

zero fill multihash and multicodec #276

Closed mikeal closed 2 years ago

mikeal commented 2 years ago

This one is gonna require some thought, so I wanted to get it out there and spend some time discussing before “IPFS thing.”

@ribasushi has been pushing hard to get piece CID’s used more broadly across the protocols. The benefit of using this over CAR CID’s for multi-block identifiers is that it will work as an inclusion proof down the line.

The problem I’m having fitting it into service layers is the padding (proofs trees have to be calculated over min sizes with a factor of 2, so one unlucky byte over the limit and the payload has to double). We can injest CAR files like w0ah now, but it’s all built on CARs and CAR CIDs (hash of the full CAR) and if the payload potentially doubles in size to cover the padding I can’t switch to piece CID based protocols.

One idea I had is, why not a multihash that is just a 64b integer that validates as true if the payload is entirely zeros of that length? Combined with a zero fill codec, the CID would be a fixed size and you’d just write the length into the multihash.

We could then write that CID into the end of the CAR as a zero length block which any protocol that needs to produce a Piece CID could easily expand.

This would be useful, but I’m a little worried that this will create a vulnerability up the stack given that it compresses a lot of predictable data into payloads and proofs without a user ever actually having to pass the data, which could mean they get to skip all the “work” in a proof. I haven’t considered it enough to know for sure, and am surely lacking the full extent to which it could potentially wreak havoc, which is why I’ve posted the idea for discussion.

mikeal commented 2 years ago

Duplicate, github double post.