ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Graphsync v1.1.0, encoded as CBOR #354

Open hannahhoward opened 3 years ago

hannahhoward commented 3 years ago

Goals

Move graphsync to a CBOR based protocol instead of one based on protobuf

Implementation

Graphsync messages should be valid IPLD data structures encoded in DAG-CBOR. I have constructed an IPLD schema to represent what they should look like.

I've also promoted Metadata out of an extension to a core part of the response message as opposed to an extension -- this is a frequent design issue with v1.0.0.

For Discussion

This is a first stab and I have many questions:

  1. I've assumed all structs encoded as maps, rather than tuples, with no field renames -- is this too inefficient?
  2. I've defined the extensions as a map of {string:Any} -- last I checked Any isn't a thing except when referring to link references. At the same time, I want to specify that an extension MUST be IPLD data which is why I did not use {string:bytes}.
  3. I prefixed everything with GraphSync... which makes things a bit verbose. But also, Priority as a type on its own for example feels pretty general. Where GraphSyncPriority makes it clear it's a priority from graphsync's point of view.
rvagg commented 3 years ago

Very nice. Is this going to be a breaking change, or an optional upgrade? I can't figure out if/how graphsync might do versioning but would it be a "can you talk v1.1.0? great, here's my DAG-CBOR messages"?

The map representation question might come down to how important shaving bytes off these messages is. It's nice having self-describing data, but it does introduce quite a bit of overhead to have the keys in there too.

If I take GraphSyncRequest and encode a simple form using this spec then it takes up 99 bytes. If I rename the keys down to single characters then it goes down to 62 bytes. As a tuple representation it's 48 bytes.

The problems with tuple representations:

So, not strong reasons to avoid tuple, but I think you're in the best position to suggest what might be best for graphsync.

warpfork commented 3 years ago

We also typically give fields lowercase or mixedCase names.

(When you feed this into codegen in golang, it's going to generate accessor methods with appropriate export case.)

warpfork commented 3 years ago

Syntax nittery aside, this logically LGTM :D

hannahhoward commented 3 years ago

Re: lowercase names -- FYI, this is the opposite of cbor-gen, though I guess I could use the rename facility to handle this. Actually, usually cbor-gen uses tuple representation so naming is irrelevant. Unfortunately, when using map representation it does not allow renames. I have a PR to do this but it remains unmerged. (https://github.com/whyrusleeping/cbor-gen/pull/46)

Sorry to keep referencing cbor-gen -- but if go-ipld-prime is to get merged through the filecoin ecosystem (which is my hope eventually) , these two will need to move toward compatibility.

hannahhoward commented 3 years ago

@rvagg yes the intent is to support both v1.0.0 and v1.1.0 for a time -- protocol negotiation is done at the libp2p level - the protocol name is already versioned, so we simply add a fallback if 1.1.0 doesn't work. We do this in Bitswap and several other protocols.

hannahhoward commented 3 years ago

@rvagg & @warpfork I believe I've addressed the comments:

  1. Fixed capitalization
  2. Settled on map in most cases except for tuple for metadata cause it's 2 fields and there are reasons to compress it maximally. put fields in lowercase while making abbreviated renames in uppercase as a size compromise. Generally, any time blocks are present the block size will likely dwarf the rest of the message
  3. While i think this is mostly good to go, I'd like to actually do an implementation in go-graphsync before we merge it, to get some real world testing (though that may end up being a bit)
  4. There's an incoming follow-on PR for some other features and implementation details I want to iron out coming shortly.
warpfork commented 3 years ago

What's still needed to advance this? At last read, both this and https://github.com/ipld/specs/pull/355 (I assume that's the other follow-up PR you mentioned) look reasonable to me.

If this is pending an implementation, and we don't expect that to happen presently, should we... perhaps merge both these PRs, but then also relocate the file and give it a header to mark it explicitly as a future-oriented draft? (I'm happy to do the git-fu and commenting on that, if that's the path forward.)

warpfork commented 3 years ago

Ping for if we can merge this :)