Proposal: Deprecate CIDv0

Stebalien commented 4 years ago

Observations:

The CIDv1/v0 split ends up infecting our code and specs with quite a bit of CIDv1/v0 specific logic. I'd prefer to isolate this logic as much as possible.
Round-tripping to the correct CID version can be tricky, especially when round-tripping through text that may need to be encoded in base32.
Fully switching to CIDv1 will require changing hashes, even when using DagPB. To work around this, the current proposed "CIDv1 by default" solution in IPFS land is to use the --upgrade-cidv0-in-output flag to only upgrade to CIDv1 when encoding the CIDs in text.

Idea: We can actually deprecate CIDv0 without changing hashes too much by making it an artifact of the DagPB encoding. That is:

When decoding a CID, always normalize to CIDv1.
When encoding a DagPB object, re-encode all links to DagPB objects using CIDv0.

What breaks? ipfs add --cid-version=1 will now be a bit funny. The returned CID will be CIDv1 but all internal CIDs (except ones referring to raw leaves) will be CIDv0. We may need to provide a way to override this behavior and really store CIDv1 DagPB CIDs when encoding DagPB.

ribasushi commented 4 years ago

ipfs add --cid-version=1 isn't "experimental" afaik - and even if it were, it's been around for quite some time. Changing its outputs by default is a bit... contrary to the promise of the API, no?

Stebalien commented 4 years ago

It's clearly marked as experimental and I doubt many users are using it. Mostly just users who're also using the "raw leaves" feature.

Stebalien commented 4 years ago

I've edited the issue to:

Make it clear that this is just a proposal.
Explain the motivation.

rvagg commented 4 years ago

DagCBOR, DagJSON, CAR files, other datastore implementations that interact with CID, I'm sure there are other places they get encoded. We could probably do a "up-convert when writing new data" but you end up with mutation where you may not expect it. How do we handle these cases in a deprecate-CIDv0 world? "Support for read but don't write" seems to land us in new kinds of trouble. "Support for read, write v0 only if it was read as v0" seems to be basically where we're at right now. What other strategies could we adopt to make a proper deprecation across the board?

mikeal commented 4 years ago

“Deprecate” can mean a lot of things, so we should spend time writing out the details of what that means. That said, I’m +1 on deprecating CIDv0 by any definition of “deprecate” 🥳

It sounds like the working definition is:

IPLD/IPFS MUST NOT produce new data with CIDv0.
IPLD/IPFS MUST parse/read data with CIDv0.

Is there some timeline in which we think we can back off of the second point?

For instance, is there a future date at which I could author a new block format and drop support for CIDv0 entirely?

Stebalien commented 4 years ago

Is there some timeline in which we think we can back off of the second point?

Never, unfortunately.

For instance, is there a future date at which I could author a new block format and drop support for CIDv0 entirely?

Well, if we implement the above proposal, new block formats going forward don't need to implement CIDv0. However, this is something that's usually implemented system-wide anyways, not something specific to the block format.

ipld / specs

Proposal: Deprecate CIDv0 #261