DagJSON: Be more specific about Multibase Base64 encoding

vmx commented 4 years ago

How CIDs are encoded was underspecified. The binary type is now also using multibase instead of plain Base64.

Reasons for using Multibase instead of raw Base64:

It is clear which variant of Base64 is used
CID libraries support it natively
Having symmetry between different Base64 encded items makes it easier to remember and probably also the implementation easier to follow

mikeal commented 4 years ago

I agree, will need to update the dag-json codec to match

rvagg commented 4 years ago

so this would be a breaking change to the spec, we're comfortable with that? because this is not expected to be actually used much in the wild?

vmx commented 4 years ago

so this would be a breaking change to the spec, we're comfortable with that?

The breaking change is in the binary type and that one is AFAIK only implemented in JS, it is not in Go or Rust. So I'm comfortable with that change.

mikeal commented 4 years ago

I don’t know of anyone currently using Binary in dag-json either. It is, for obvious reasons, the worst part of dag-json so whenever you encounter binary it’s better to use links to raw blocks instead.

warpfork commented 4 years ago

Can someone review whether this will make it significantly more difficult to process these entities -- specifically, if it will still be possible to do it with a fixed amount of lookahead in terms of token count during parsing?

There's a fairly hairy piece of code in the go codebase for handling this: https://github.com/ipld/go-ipld-prime/blob/7e0619f3a9842f4a84737fc622c8ce56ef0d5d46/codec/dagjson/unmarshal.go#L39-L73 -- and it's based upon a very fortunate coincidence that there's fixed number of tokens we might need to deal with before it becomes clear if we need to engage any special logical branches. If this becomes untrue, it bodes poorly for any efficient parser implementations.

If that code (or something like it) can still safely work through that decision tree within that fixed amount of memory, I'm happy. If it can't, we might want to search for alternative ideas that don't present such a problem.

Aside from the above concern, can confirm that bytes in dag-json simply haven't been previously supported by the go libraries, so no objections from me on that front.

mikeal commented 4 years ago

FYI, this means we can’t link to CIDv0 from dag-json because you have to use base58btc. I’m fine with that, but just want to make sure everyone is aware since this is what we’re doing.

vmx commented 4 years ago

FYI, this means we can’t link to CIDv0 from dag-json because you have to use base58btc. I’m fine with that, but just want to make sure everyone is aware since this is what we’re doing.

I see two thumbs down. Please let's do this and only support CIDv1. Let's keep DagJSON easy to produce and consume without the need to have full library support.

Stebalien commented 4 years ago

WRT CIDv0, I've floated this idea around a few times: https://github.com/ipld/specs/issues/261

austinabell commented 4 years ago

FYI, this means we can’t link to CIDv0 from dag-json because you have to use base58btc. I’m fine with that, but just want to make sure everyone is aware since this is what we’re doing.

I see two thumbs down. Please let's do this and only support CIDv1. Let's keep DagJSON easy to produce and consume without the need to have full library support.

Oh I guess I was under the assumption that this would just be optional support (ie. if a provider or consumer supported the ambiguous multibase, it would work fine, but wouldn't have to?) I don't see why you couldn't serialize/ deserialize a v0 if you wanted to support. Didn't really mean to downvote, just meant to indicate that you can still technically do that

vmx commented 4 years ago

I force pushed a completely new version, based on our conversations during the weekly call.

I restructured it a bit and also changed the language to match other specs, like the Data Model one (talking about "kinds" and not about "types". Please do another full review.

vmx commented 4 years ago

We talked about this in the last IPLD Dev Meeting and found and agreement within the team, hence I merge it now.

ipld / specs

DagJSON: Be more specific about Multibase Base64 encoding #259