Open MattEWeber opened 4 years ago
This is really helpful. FlatBuffers and MessagePack look the most interesting to me.
Edward
Edward A. Lee EECS, UC Berkeley eal@eecs.berkeley.edu http://eecs.berkeley.edu/~eal
On Jan 25, 2020, at 2:30 AM, mew2ub notifications@github.com wrote:
We've had some discussion recently on drawbacks of protocol buffers, so I thought it would be good to actually learn about and catalogue the differences between some of the popular alternatives. Here's what I found:
A larger but less detailed comparison is at https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
Human Readable but Inefficient
----JSON---- pros:
- Human readable
- Very compatible with TS target
- Parsers and encoders are available for essentially every conceivable target
- Many programmers are used to it
- No setup required
- Allows arbitrary nesting of arrays and key-value maps (i.e. objects) cons:
- Relatively slow to parse
- Large message size
- Code to validate messages will have to be hand-written inside a reactor. -- Note. JSON schemas do exist, but few people use them
----XML---- pros:
- Human readable
- Parsers and encoders are available for essentially every conceivable target
- Many programmers have used it before
- Setup is optional. Schemas are necessary for validation but not required.
- A binary encoding called Efficient XML Interchange exists, but with limited support cons:
- Very slow to parse
- Huge message size
- Schemas are strict and hard to maintain as code evolves over time
Fast Binary Encodings
----MessagePack---- "It's like JSON. but fast and small."
pros:
- Supported for 101 languages!
- An efficient binary serialization format.
- Nestable maps and arrays.
- No schema means very flexible and no compiler needed. cons:
- No schema means data isn't validated
- No attached RPC mechanism like Protocol Buffers or Thrift
----BSON---- See: http://bsonspec.org/ A binary-encoded serialization of JSON-like documents. Basically, native language primitive types get encoded into a JSON-like structure
pros:
- "more 'schema-less' than Protocol Buffers" means it's more flexible
- Supported for 27 languages
- Faster to decode than JSON cons:
- more 'schema-less' than Protocol Buffers" means it's validated less
- slightly less space efficient than protocol buffers and JSON
Binary Encoding + RPC
----Protocol Buffers---- pros:
- Very small message sizes
- Very quick to parse
- Officially supported for 8 potential targets (including C and C++) -- Unofficially supported for 30 additional targets (including JS and TS)
- Intended to be backward and future compatible with evolving message formats
- Supports importing other .proto message definitions
- Well documented
- Designed for compatibility with GRPC for remote procedure calls cons:
- Only usable with .proto definitions of message formats
- Requires installation of a compiler for each language
- Generated library files for parsing and encoding have to be managed and linked/imported to reactor code
- Weird type system and rules for writing .proto files -- You have to assign unique numbers to fields -- Messages can be nested, but key-value maps can't -- No explicit lists, but fields containing primitives or other messages can be repeated. Maps can't be repeated
- Via https://reasonablypolymorphic.com/blog/protos-are-wrong/ -- Missing fields can't be distinguished from fields assigned the default value. -- Message types have counterintuitive behavior with missing values -- msg.foo = msg.foo, isn't a no-op. It will silently change msg to have a zero-initialized copy of foo if it previously didn’t have one --- One optional field exists in messages for each case of 'oneof'. Meaning if the msg has oneof 'foo' / 'bar' the msg.foo = msg.foo statement will overwrite whatever data was in bar when it zero-initializes foo. --- That said, I can't think of many reasons why a programmer would write msg.foo = msg.foo in the first place. -- Backwards compatibility is at odds with effective validation. -- Protobuffer's weird type system "infects" the type system of any code that has to deal with it. -- Notably this essay doesn't give any recommendations for better technologies
----Apache Thrift---- Via https://en.wikipedia.org/wiki/Apache_Thrift
- "Thrift is written in C++, but can create code for a number of languages. To create a Thrift service, one has to write Thrift files that describe it, generate the code in the destination language, write some code to start the server, and call it from the client." pros:
- Supports 28 languages: C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and others
- Thrift is a full on GRPC alternative. Meaning it doesn't just serialize and deserialize data, it also implements an entire stack for transmitting it.
- Thrift's type system is based on C++ and uses structs and unions for composition. -- Container types are generic maps, sets, and lists, however these cannot be nested within each other cons:
- Like Protocol Buffers, Thrift uses a custom interface description language that has to be compiled for each language
- I don't know if thrift's binary serialization is separable from the rest of the RPC stack.
- Documentation isn't as good as protocol buffers
Potentially Interesting But Not Enough Language Support
----FlatBuffers---- pros:
- via https://google.github.io/flatbuffers/ -- What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility)
cons:
- Only currently supports C++, C#, C, Go, Java, JavaScript, Lobster, Lua, TypeScript, PHP, Python, and Rust
----Avro---- Schemas are defined in JavaScript
cons:
- Only currently supports C, C++, C#, Java, Python, and Ruby
----Microsoft Bond---- pros:
- Supports a very rich type system including inheritance, type aliases, and generics cons:
- Only supports C++, C#, Java, and Python
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
reactor1 > ----- number ----- < networkAction > ------ JSON.stringify() / JSON.parse() & typeAnnotation for returned value from parse() ------ < networkAction >--- number ---- < reactor2
We've had some discussion recently on drawbacks of protocol buffers, so I thought it would be good to actually learn about and catalogue the differences between some of the popular alternatives. Here's what I found:
A larger but less detailed comparison is at https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
Human Readable but Inefficient
----JSON---- pros:
----XML---- pros:
Fast Binary Encodings
----MessagePack---- "It's like JSON. but fast and small."
pros:
----BSON---- See: http://bsonspec.org/ A binary-encoded serialization of JSON-like documents. Basically, native language primitive types get encoded into a JSON-like structure
pros:
Binary Encoding + RPC
----Protocol Buffers---- pros:
----Apache Thrift---- Via https://en.wikipedia.org/wiki/Apache_Thrift
Potentially Interesting But Not Enough Language Support
----FlatBuffers---- pros:
cons:
----Avro---- Schemas are defined in JavaScript
cons:
----Microsoft Bond---- pros: