mapbox / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
2.72k stars 432 forks source link

flatgeobuf as a source format #777

Open andrewharvey opened 5 years ago

andrewharvey commented 5 years ago

One of the performance bottlenecks of using Tippecanoe is generating the GeoJSON input, which is both expensive (performance wise) to produce and I assume for tippecanoe to read (though I must say, it's still very fast).

I commonly use ogr2ogr to read a range of input formats (frequently ESRI FileGDB) and pipe out GeoJSONSeq for tippecanoe to read, however this is slow.

The performance of flatgeobuf looks really compelling, if Tippecanoe could naively read it, that might prove to improve performance greatly for some use cases.

gertcuykens commented 5 years ago

Totally agree that geojson is not made for heavy lifting, nor does geobuf. But I would stick with something protobuf because that is already used for generating the tiles in the first place. (2de protobuf works better in golang but that is my problem) Also I like to suggest to reduce as much moving parts as possible especially when dealing with C++ code base :)

e-n-f commented 5 years ago

Thanks. I'll have to take a look at this.

gertcuykens commented 5 years ago

Although I am going to shoot myself in the foot here because this solution requires a lot of work to be able to generate flatgeobuf file in golang applications, I still believe in the bigger picture that geojson has to be replaced with something better. My preference was something protobuf like but if it going to be flatgeobuf instead then so be it. see https://github.com/bjornharrtell/flatgeobuf/issues/7 asked for colaboration to make sure it get the best chance of succes.

bjornharrtell commented 5 years ago

Great to see this interest! :) I'm fairly confident flatgeobuf is well suited as a serialization format to replace geojson when performance matters. Note that flatgeobuf is based on flatbuffers not protobuf which is also for performance reasons that are well explained at http://google.github.io/flatbuffers/. Flatbuffers has go lang support so it's "just" a matter of wiring that up to my format. I have limited experience with go but I'm open to try to provide the language support for go ASAP if that means you are willing to be a (very) early adopter.

gertcuykens commented 5 years ago

Thanks, I will test but for some reason flatbuffers are only faster in C++, in golang it's turn out gogoprotobuf is faster then flatbuffers source: https://github.com/alecthomas/go_serialization_benchmarks I guess more time has been invested in protobuf optimisation because of other heavely used systems like grpc etc that all use protobuf. Also if I am not mistaken tippecanoe uses https://github.com/mapbox/protozero for its tile generation that is also way faster then regular protobuf if i am not mistaken

bjornharrtell commented 5 years ago

Interesting. I think you are right, it's likely because the flatbuffers Go implementation hasn't got the attention of the reference C++ one. However, there is also a C and Rust implementation that seem to perform close to the C++ one so the potential is there and if I read the numbers correctly the flatbuffers Go implementation is not far behind gogoprotobuf even in its current state so perhaps it's not a dealbreaker?

I have been considering whether to use flatbuffers or protobuf as a base for some time but I remain convinced that flatbuffers has some desirable properties over protobuf, even if things like this is not making the choice easier. I'm not too keen on that protobuf need "special" implementations and/or considerations/constraints to be fast.

gertcuykens commented 5 years ago

Agree performance isn't going to be a key factor, more like the boilerplate quality that protoc and flatc creates. I will try to make some examples but last time I looked into this a good amount more code was needed to initialise / modify a flatbuffer object then a protobuf object in go if I recall correctly. Note that all the protobuf variations still use the same proto files so for me that's fair enough as long they don't break the wireformat and are just implementation details.

bjornharrtell commented 5 years ago

I see your point but I'm not sure transforming between two protobuf schemas would be that much prettier and I guess potentially significantly slower than both flatbuffer to protobuf or flatbuffer to flatbuffer in the optimal case. However, I can definitely understand the want of using a single serialization method/format in the use case discussed and protobuf is of course more common so it's another sound argument to respin flatgeobuf on protobuf but for several reasons I'm not prepared to go that path right now without further consideration/motivation.

bjornharrtell commented 5 years ago

Another thought - do you not have an abstraction layer for the transformation already between GeoJSON and protobuf? I've implemented such abstraction layers in the flatgeobuf language support for C++, .NET, Java and TS/TS and I imagine it should be possible to find a good target for Go too, perhaps https://github.com/paulmach/orb?

Of course for maximum performance transformation should be kept to a minimum, so for example in my GDAL driver implementation I'm accessing the flatbuffer directly and I can zero-copy at least the coordinate arrays because GDAL and flatgeobuf and share the same basic memory model for coordinate arrays.

gertcuykens commented 5 years ago

https://github.com/paulmach/orb is definitely the way to go for implementing geojson objects in go :) But you don't have to worry about a geojson interface I guess, my problem I think and allot of other people's problem is how to feed / handle for example this ridiculously giant osm planet pbf to tippecanoe. Add the moment I read and schuff osm stuff into geojson or geobuf files because we don't have another choice :) So any wireformat were tippecanoe agree on as input source is going to be a huge win. So for example if we all agree on a stream of small records based on FlatBuffer Schema (https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html) and I can go from osm.pbf to saving a file on disk without requiring 128GB of memory and tippecanoe can handle it to we are golden :D

andrewharvey commented 5 years ago

^ all of this is good, but tangential to this ticket for potential flatgeobuf support in tippecanoe, since tippecanoe is written in C.

For me it's only worth supporting in tippecanoe if it's going to be faster to pipe flatgeobuf to tippcanoe from ogr2ogr, compare to piping GeoJSONSeq to tippecanoe.

It's still a nice to have for me (and low priority) since the core GeoJSON support is rock solid, isn't going anywhere since it's a hugely popular and supported format.

bjornharrtell commented 5 years ago

@andrewharvey hmm yes I was starting to get confused because I couldn't find any Go code in tippecanoe.

Raw read performance of FlatGeobuf in GDAL is about 30 times faster than GeoJSON. I will make some measurements for write performance soon but I expect it to be in the same ballpark (without spatial index generation).

Even if not high priority to rework this part of tippecanoe it would be a nice experiment and a possible motivation to get FlatGeobuf accepted in GDAL, so I'm interested in contributing if time permits.

gertcuykens commented 5 years ago

No Go has indeed nothing to do with tippecanoe itself, but more with building tools that generate input for tippecanoe. Like for example ogr2ogr does to generate input for tippecanoe. So ogr2ogr should be able to be replaced by as many tools or programing languages possible. For example nodejs is extensively used by mapbox for creating tools and want to be sure to point out ogr2ogr is just a small part of the bigger picture here when considering a universal format to feed to tippecanoe.

1riggs commented 3 years ago

Has this been added to tippecanoe? I see there is support for geobuf files but I assume that's only mapbox/geobuf files and not flatgeobuf, as produced by say ogr2ogr.

bjornharrtell commented 3 years ago

@1riggs unfortunately no progress AFAIK. I still don't have a finished reference implementation in Go. Would be fun to do but my interest in Go is unfortunately being eclipsed by Rust.

bdon commented 2 years ago

I'm tracking this issue in a new repository at https://github.com/protomaps/tippecanoe/issues/2 - input is welcomed.

bdon commented 2 years ago

This has been implemented in https://github.com/protomaps/tippecanoe , although with only the minimum Geometry Types and Column types support to convert all GDAL-created Natural Earth FGBs with identical output as GeoJsonSeq.

I'm seeing a general 5-10x speedup for the parsing phase vs GeoJsonSeq, not to mention that FGB creation should also be smaller and much faster than GeoJsonSeq. No streaming support yet though. Happy to help look at people's FGBs but will move convo into https://github.com/protomaps/tippecanoe/issues/2 .