tendermint / go-amino

Protobuf3 with Interface support - Designed for blockchains (deterministic, upgradeable, fast, and compact)
Other
259 stars 78 forks source link

Wire vs CBOR #126

Closed gibson042 closed 5 years ago

gibson042 commented 6 years ago

Sorry to be That Guy™, but why would someone use this instead of CBOR? And if there is a reason, "Wire vs CBOR" deserves a heading in the README.

jaekwon commented 6 years ago

I don't know CBOR... I guess the question to ask is, how is it different than primitive structure/object types of major languages?

   *  It must represent a reasonable set of basic data types and
     structures using binary encoding.  "Reasonable" here is
     largely influenced by the capabilities of JSON, with the major
     addition of binary byte strings.  The structures supported are
     limited to arrays and trees; loops and lattice-style graphs
     are not supported.

It seems that Amino is philosophically opposed to CBOR. Amino aims to represent arbitrary data types, e.g. with complex nested structures. We don't target JSON, we target languages like Go/Rust/C++/Java/Python. Structure support is provided by a schema. In this implementation it's currently only provided in the form of a Golang struct declaration, but we plan to support a modified version of the Protobuf schema file.

For example, CBOR supports indefinite-length arrays. When modern languages like Go/Rust support lists, they keep track of the length of the list, so there's no reason to support infinite-length arrays in Amino. For streaming, Amino provides compatible MarshalBinary and UnmarshalBinaryReader methods, so you can stream messages one at a time from the same reader.

  *  There is no requirement that all data formats be uniquely
     encoded; that is, it is acceptable that the number "7" might
     be encoded in multiple different ways.

This is huge. Amino is a deterministic protocol. It's designed for blockchain environments. 7 can only be encoded in one way, given the type of the field.

  1. Data must be able to be decoded without a schema description.

As in Protobuf, Amino requires the schema description. However, like Protobuf, the binary encoding has sufficient information for partial destructuring. It's just better in Amino.

The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder.

Our reflection implementation is extremely tidy, but eventually we will implement (or we will pay a bounty for) code generators for faster encoding/decoding. Code architecture is something that we (Amino) can solve for later, regardless of the complexity of the problem at hand.

It seems like CBOR and Amino, like CBOR and Protobuf, are solving for completely different problems. Amino is a codec for persisting logic objects in blockchain smart contracts, as you can with the Cosmos SDK.

gibson042 commented 6 years ago

TL;DR: If you don't decide that CBOR makes more sense than a custom serialization format, then "Amino vs CBOR" deserves a README heading.

It seems that Amino is philosophically opposed to CBOR. Amino aims to represent arbitrary data types, e.g. with complex nested structures. We don't target JSON, we target languages like Go/Rust/C++/Java/Python.

I think you have misunderstood, which is very unfortunate because it seems like this format is completely unnecessary. CBOR can represent arbitrary data types with nested structures (and builds in nonnegative integer, negative integer, byte string, text string, array, key–value map, float16, float32, float64, boolean, null, undefined, datetime, decimal fraction, bignum, bigfloat, and base64), it just can't natively represent multiply-referenced values (e.g., cyclic data structures). Those limitations are intentional, and also true of Protobuf AFAICT. Even your README seems to support CBOR, since it handles both points from the Amino vs JSON paragraph ("we need a more compact and efficient binary encoding standard" and "Amino is fully compatible with JSON encoding").

there's no reason to support infinite-length arrays in Amino … 7 can only be encoded in one way, given the type of the field

Both of those can be handled by making Amino a restricted subset of CBOR and/or accepting the increased representational power.

As in Protobuf, Amino requires the schema description.

Requiring a schema description is relevant, though you should really take a look at https://github.com/tailhook/probor ("an extensible mechanism for serializing structured data on top of CBOR").

Code architecture is something that we (Amino) can solve for later, regardless of the complexity of the problem at hand.

CBOR is explicit about compact serialization being important, but strictly less important than encode/decode simplicity, which seems really valuable for blockchain processing. Does Amino prioritize those goals differently—or at all? At the very least, it would be worthwhile to perform benchmarking comparisons on those metrics you consider important.

It seems like CBOR and Amino, like CBOR and Protobuf, are solving for completely different problems. Amino is a codec for persisting logic objects in blockchain smart contracts

"persisting logic objects in blockchain smart contracts" is exactly the kind of thing for which CBOR is intended (emphasis mine):

The objectives of CBOR, roughly in decreasing order of importance, are:

  1. The representation must be able to unambiguously encode most common data formats used in Internet standards.
  2. The code for an encoder or decoder must be able to be compact in order to support systems with very limited memory, processor power, and instruction sets.
  3. Data must be able to be decoded without a schema description.
  4. The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder.
  5. The format must be applicable to both constrained nodes and high-volume applications.
  6. The format must support all JSON data types for conversion to and from JSON.
  7. The format must be extensible, and the extended data must be decodable by earlier decoders.

Still, it's your project and you can do what you want. But I won't be the only person to point out the benefits of CBOR, so you should preemptively address them in your README.

jaekwon commented 6 years ago

Probor seems like it's trying to solve the gap between CBOR and Protobuf while improving upon Protobuf. But it's also a proof-of-concept, and it's basing the encoding on CBOR which isn't quite optimized for what it's trying to do... Proto3/Amino uses field keys, which are field numbers + type encoded as a single varint, for every struct field. It looks like Probor is trying to encode the field numbers implicitly by index?

I agree that it's worth a comparison, but lets add more discussion here before adding to the README, so we get it right.

Amino's priorities are:

gibson042 commented 6 years ago

Unique/deterministic encoding of values.

This is the kind of thing that can be achieved at the encoder/decoder level, or in the data format. The former is possible in most data formats, particularly including CBOR (e.g., "every value must be represented by its shortest encoding, and map keys must be ordered lexicographically by wire-format octets"). The latter is not a property of CBOR or Amino (from your own README: "Amino ensures that concrete types (almost) always have the same canonical representation", "Amino tries to transparently deal with pointers (and pointer-pointers) when it can", "Supported (but discouraged) types… Maps… Floating points").

Data must be decoded with a schema description, but the structure must be scannable without a schema description (a property which Proto3 doesn't have).

Obviously true of CBOR.

The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder.

More true of CBOR than of Amino, because wire-level field indices and especially registered types of the latter add complexity. Stop codes add equal complexity to both (CBOR needs them for indefinite-length values, Amino needs them for terminating struct values). It's also interesting to compare representations of your example data:

Example Amino CBOR
List{ MyList: []Item{ Item{1}, Item{3} } } 10 bytes: 0E 03 02 08 02 04 08 06 04 04 9 bytes: A1 01 82 A1 01 01 A1 01 03
ListOfLists{ MyLists: []List{ []Item{ Item{1}, Item{3} } } } 12 bytes: 0E 06 01 03 02 08 02 04 08 06 04 04 10 bytes: A1 01 81 82 A1 01 01 A1 01 03
List{ MyList: []*Item{ Item{1}, nil } } 9 bytes: 0E 03 02 00 08 02 04 01 04 7 bytes: A1 01 82 A1 01 01 F6

A sufficiently compatible JSON format will also be maintained, but general conversion to/from JSON is not a priority.

This is similar to CBOR, although that format has the edge here too because it actually specifies suggested conversion rules for both directions and comments on their limitations.

jaekwon commented 6 years ago

We dropped the last struct term from the top-level structs, so the above table should show 9, 11, and 8 bytes respectively. https://github.com/tendermint/go-amino/pull/160

Additionally, varint support in CBOR is a lacking compared to Proto/Amino, since it only supports 1,2,4,8 byte numeric types, and is more complex to implement.

A strict with 3 numeric fields of value 10, 20, and 30 takes 8 bytes in CBOR (for {1: 10, 2: 20, 3: 30}), whereas in Amino it only takes 6: 08141028203C. So, one is longer than the other depending on usage, but I'm happy with our design choices for Amino. I could write Amino bytes by hand with mental arithmetic. I can't do the same CBOR due to the complexity of decoding logic.

The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder.

I'd say Amino wins here :)

gibson042 commented 6 years ago

We dropped the last struct term from the top-level structs, so the above table should show 9, 11, and 8 bytes respectively. #160

The data was taken straight from your README, dude.

Additionally, varint support in CBOR is a lacking compared to Proto/Amino, since it only supports 1,2,4,8 byte numeric types, and is more complex to implement.

https://tools.ietf.org/html/rfc7049#section-2.4.2

A strict with 3 numeric fields of value 10, 20, and 30 takes 8 bytes in CBOR (for {1: 10, 2: 20, 3: 30}), whereas in Amino it only takes 6: 08141028203C. So, one is longer than the other depending on usage, but I'm happy with our design choices for Amino. I could write Amino bytes by hand with mental arithmetic. I can't do the same CBOR due to the complexity of decoding logic.

:roll_eyes:

I'd say Amino wins here :)

Yes, you obviously have an emotional attachment to your baby. But this conversation has drifted... I don't find your reasons persuasive, but that's completely beside the point; they still need to be documented. To repeat myself yet again: "Amino vs CBOR" deserves a README heading.