storacha-network / specs

🏅 Technical specifications for the w3up protocol stack
17 stars 0 forks source link

Replacing store/* with blob/* #111

Open Gozala opened 5 months ago

Gozala commented 5 months ago

As we have discovered recently some of our assumptions are misplaced https://github.com/web3-storage/w3up/issues/1304. Specifically store/add assumes to take CAR cid and gives you back pre-signed upload URL, however nothing in the system verifies that uploaded content was a valid CAR file, we just check that hash digest of the bytes matches hash digest of the file uploaded.

I would like to propose adjusting our assumptions and specifically propose blob/* API protocol that is equivalent of store/* except that it takes multihash as opposed to CAR cid. This would also better align with motivation of not packing files into CARs but rather storing them as is and creating an index separately.

We could deprecate store/* API in favor of blob/* and having it active for some time. Furthermore this would allow us to layer content claims on top, that is if blob is the CAR content claim could be made about that fact.

I would also be happy to the store/* namespace go away since it is not a great name for what it is.

Gozala commented 5 months ago

I suppose instead of raw hash we could require raw CID instead, although I'm not sure if it would have any benefit over just multihash, perhaps if in some distant future actual stores support other verification methods.

alanshaw commented 5 months ago

Specifically store/add assumes to take CAR cid

Is this right? The spec says it should be a CAR, but implementation and types do not restrict to CAR.

vasco-santos commented 4 months ago

we have things being stored without CAR cid, so I think that we are not restrictive on it both in spec and implementation

Gozala commented 4 months ago

Is this right? The spec says it should be a CAR, but implementation and types do not restrict to CAR.

Implementation may not restrict it, but it definitely assumes that CARs will be uploaded. In practice however we find that there had been uploads that match the hash but aren't CARs.

It is worth calling out that just ignoring CID prefix is not a right solution, as things will get mislabeled. We really should do one of the following:

  1. Restrict CID prefix to raw blocks only (because we do not really verify that block corresponds to codec prefix)
  2. Switch to multihashes instead of CIDs (as that is something we do verify)
  3. Add verification (so we could enforce that uploaded bytes match the claimed codec)

The third option is a lot more complicated than first two, furthermore we could achieve same effect by uploading arbitrary bytes and then submitting content claim triggering lazy verification. As of first two I have slight preference for 2nd, but can buy into 1st if good argument is presented in it's favor.