ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

dag-json: cleanup and fully specify reserved space #356

Closed rvagg closed 3 years ago

rvagg commented 3 years ago

I've been working ok figuring out how to codify the rules that we discussed sometime last meeting .. during some meeting. I can't find the particular meeting by looking at the minutes in https://github.com/ipld/team-mgmt so I'm doing this from memory. But the main points were:

After working through some of this, I've flipped on my opinion of the second point and would rather it stay as proper multibase, even if it's still strictly base64. It adds clarity and there's scope for using this for further validation if we want to. It's a single byte and I think it increases the clarity and reduces the number of edge cases. So I haven't included that change here, but discussion is welcome if you feel strongly!

The bit to pay most attention to below is "The Reserved Namespace" where I've tried to enumerate the rules. It turns out to be quite hard to explain it with crystal clarity, but that could be me so suggestions welcome!

Along with this, as exploration, I've been working on a version of JS DAG-JSON that does tokenized decoding and uses the same base encoder/decoder as the new JS DAG-CBOR codec, just by swapping out the backend. There's some consistency benefits here in having the same code paths and same affordances for applying data model strictness. Still uncertain whether I'll actually propose that we merge this, although I did uncover some bugs in the current implementation in the process, but the code is here: https://github.com/ipld/js-dag-json/compare/rvagg/cborg (see the innards of DagJsonTokenizer for the token parsing rules that arise from the reserved space rules).

warpfork commented 3 years ago

I went digging for our discussion notes (and merged some stale PRs) and found them here:

https://github.com/ipld/team-mgmt/blob/master/meeting-notes/2020/2020-11-05--ipld-specs.md#dag-json-bytes-and-links-discuss-parsing-rules-edge-cases

rvagg commented 3 years ago

I've addressed the reviews so far and I think this is ready for landing. The reserved namespace rules are hard to be clear about but I've done the shuffle recommended by @vmx and separated things out a little more to hopefully make it easier to digest.

I went with the .0 rule for floats with no fractional component - although this is no help to JavaScript but Go can implement it.

I also have a working JavaScript version that I'm pretty much ready to release. It uses the same backend as the DAG-CBOR codec but with JSON semantics. We've lost some speed in the process but we get all the reserved namespace rules and all the strictness.

@warpfork care to review the changes?

rvagg commented 3 years ago

As per a brief discussion today:

So either we should fix go-ipld-prime to match spec or we just bite the bullet and remove it and get it done everywhere—I can do this in JavaScript pretty quickly after merging this.

@warpfork @vmx @willscott (and anyone else) 👍 for just remove multibase prefix and 👎 for leave the spec alone darn it!.

vmx commented 3 years ago

My reasons for having the multibase prefix:

All that said, I don't have a strong opinion and can live with either way.

rvagg commented 3 years ago

at first I couldn't get them making the same bytes and thought there must be a multibase incompatibility (the horror!) but it's []byte('deadbeef') being deceptive cause that's actually a string, not a hex value in spite of the nice hex characters..

also, sorting, ipld-prime don't sort and that engages hostilities in @warpfork and is not fun for compatibility fixtures :facepalm: and I kind of don't want to have that argument over and over so will skip it for now.

But basically, it's not a big deal to do it in go-ipld-prime. It's also not a big deal to strip it in @ipld/dag-json, but I'm also a soft -1 to changing the spec and leaving the prefix in there for reasons stated in this thread and restated by @vmx. But, it's not a big enough deal if someone else has particularly strong feelings and justification.

warpfork commented 3 years ago

Arguments against inserting multibase in this bytes field, just to have them written:

rvagg commented 3 years ago

fine, you appear to have a stronger opinion than either of us (although I'm not sure I'm convinced it's all that strong)

have pushed a new version that removes it and will do likewise in the JS impl

rvagg commented 3 years ago

(PTAL & merge if :thumbsup:)