mapbox / geobuf

A compact binary encoding for geographic data.
ISC License
978 stars 84 forks source link

mapbox/pbf as an encoding/decoding alternative? #28

Closed mourner closed 9 years ago

mourner commented 9 years ago

As an alternative to Protobuf.js and protocol-buffers libraries, could we use Konstantin's pbf? It's much more low level and without proto reading, but also simpler and gives us more control over encoding and decoding, potentially making it faster.

mourner commented 9 years ago

pbf package got tons of improvements since this comment, and after I merge some recent performance improvements, performance will look like this:

decode vector tile with pbf x 473 ops/sec ±0.64% (94 runs sampled)
decode vector tile with protocol-buffers x 291 ops/sec ±1.08% (90 runs sampled)
native JSON.parse of the same data x 242 ops/sec ±0.70% (91 runs sampled)
encode vector tile with pbf x 175 ops/sec ±0.90% (84 runs sampled)
encode vector tile with protocol-buffers x 62.99 ops/sec ±0.84% (67 runs sampled)
native JSON.stringify x 211 ops/sec ±0.73% (92 runs sampled)

Given that with pbf, you don't need an intermediate JSON representation of protobuf data to read/write, geobuf performance using pbf will be stellar. cc @brendan-ward, ref #28

brendan-ward commented 9 years ago

Looks like a real performance win to me. :+1: Especially combined with a tool like you've outlined here for generating the code from the proto, should make keeping updated with the latest proto easy.

I'm not at all wedded to the implementation in #17, I was just trying to help move this along.

mourner commented 9 years ago

@brendan-ward great! The perf improvements are in and the library should be stable now, so I'll take a stab at making a new geobuf implementation using pbf. Should be a very easy port from pygeobuf.

mourner commented 9 years ago

Decoding a sample 12MB zips geobuf (created from a 100MB GeoJSON) takes 1 minute with my pygeobuf Python decoder (based on official Protobuf Python bindings).

Wrote an initial pbf-based decoder implementation in JS today. The same task takes 2 seconds, which is 24% faster than JSON.parse of the same GeoJSON data. Wow. cc @mick @tmcw

brendan-ward commented 9 years ago

@mourner Awesome! We need to help you add Cython to pygeobuf to speed that up too.

mourner commented 9 years ago

@brendan-ward low priority though — the python repo was more of a playground for prototyping than an encoder/decoder I'd want anyone to use in production. It served its purpose, now we need something nice and fast in JS :)

mourner commented 9 years ago

For comparison — on an equivalent zips pbf encoded with the current geobuf (which is 6 times bigger to be fair), the decoder freezes for a couple of minutes and then errors with this (seeing that error for the first time btw):

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
mourner commented 9 years ago

On smaller files which don't error out, the difference is not as dramatic but still nice: the Idaho test case decoding is 620ms for the current geobuf and 180ms for the new one.

mick commented 9 years ago

@mourner that's awesome! The smaller file case is great too. It will add up, because in cardboard we save features individually, loading a set of indexed features involves a lot of small decodes.

mourner commented 9 years ago

Encoding is 3-4 times faster as well — it takes 730ms to encode Idaho sample with the current geobuf and 200ms with the new one, including automatic precision and dimensions detection. For comparison, JSON.stringify of the same GeoJSON takes 625ms.

mourner commented 9 years ago

On the zips sample, it's 11.2s vs 2.4s to encode (excluding the JSON.parse step).

mourner commented 9 years ago

Fixed by #29

mourner commented 9 years ago

@mick 1.0.0 released, so it's ready for cardboard migration. Not sure if we should start looking into this now or wait until we announce the new Geobuf publicly and gather some feedback.

mick commented 9 years ago

@mourner awesome! I think we can wait a bit for feedback before migrating, but I'll start playing with this.