okTurtles / group-income

A decentralized and private (end-to-end encrypted) financial safety net for you and your friends.
https://groupincome.org
GNU Affero General Public License v3.0
331 stars 44 forks source link

Switch to Protocol buffers for consistent hashing across platforms #84

Closed taoeffect closed 7 years ago

taoeffect commented 8 years ago

Problem

JSON can be slow and unwieldy. For sending messages between users and also (maybe) the server/client it can be useful to use something more efficient.

Solution

Protobuffs might be a better fit? Unclear, and we should make sure that whatever we do works as well with a centralized backend as it does with a decentralized backend.

vijayee commented 7 years ago

If in fact we use protobufs we will have to create *.protoobject definitions for all our data objects that would have to be packaged into the frontend code. Other serialization methods to consider http://msgpack.org/ http://cbor.io/ https://amznlabs.github.io/ion-docs/ All accept ion don't have to have a defined schema between devices operating on the data. I'm going to do a test on the message length for them. I imagine protobufs is going to be the smallest given it has no defined field names in the data.
As far as performance comparisons -- https://jsperf.com/msgpack-js-vs-json/37 -- http://tutorials.jenkov.com/iap/ion-performance-benchmarks.html Format comparisons -- http://tutorials.jenkov.com/iap/ion-vs-other-formats.html

If json is ultimately preferred there are a couple of deterministic json function that could be bundled from npm such as https://github.com/substack/json-stable-stringify and https://www.npmjs.com/package/json-stringify-deterministic

taoeffect commented 7 years ago

@vijayee Just renamed the issue to focus on what's really needed. Agree that protobufs would be annoying. Cool finds! Any pros/cons?

vijayee commented 7 years ago

For the following test

var msgpack = require('msgpack')
var cbor = require('cbor');
var protobuf = require('protocol-buffers')
var proto = new Buffer(`
message Test {
 required string name = 1;
 required float amount = 2;
 required bytes data = 3;
}
`)

msg = protobuf(proto)
test ={
  name: 'Sir Lucius',
  amount: 98.2312,
  data: new Buffer('platypus fur')
}

let buf = msgpack.pack(test)
let buf2= new Buffer(JSON.stringify(test))
let buf3 = cbor.encode(test)
let buf4 = msg.Test.encode(test)
console.log(`msgpack size: ${buf.length}`)
console.log(`JSON size: ${buf2.length}`)
console.log(`CBOR size: ${buf3.length}`)
console.log(`Protobuf size: ${buf4.length}`)

we get thes sizing results

msgpack size: 52
JSON size: 118
CBOR size: 51
Protobuf size: 31

I would say we should use CBOR if we don't want to have predefined schemas for objects its about half the size of JSON. I found the libraries for msgpack having strange apis or dependencies on C++ which make it less desireable for the browser. There also turned out to be no javascript implementations of ion.

taoeffect commented 7 years ago

@vijayee That's awesome, thanks! But what about key order? That's the most critical part. If one JS implementation sorts object keys alphanumerically, while another does it some other way, which of these methods will always serialize the data in the same way, the same order, consistently across platforms? Otherwise we'll get different hashes.

vijayee commented 7 years ago

cbor rfc -- https://tools.ietf.org/html/rfc7049 -> section 3.9 talks about sorting, but it's optional sorting of keys -- https://github.com/hildjj/node-cbor/blob/master/lib/encoder.js#L373

taoeffect commented 7 years ago

Compare performance + API + browser support + code size for: