mapbox / geobuf

A compact binary encoding for geographic data.
ISC License
976 stars 84 forks source link

Compatibility with "classic GIS file formats" #34

Open joto opened 9 years ago

joto commented 9 years ago

I think we need to think about compatibility with "classic GIS file formats" like Shapefiles. What I mean by that is all those formats that only support one type of geometry per layer and maybe even only one layer per file. It should be well defined how those files map to Geobuf files. I am not saying we should limit ourselves to what those files support. But I think it would be useful to define a subset of the Geobuf format that is guaranteed to map well to those formats. Maybe even define some flag that can be set promising that the file contents behave in that way.

mourner commented 9 years ago

Not sure about this. I'm inclined to keep away from supporting those formats directly, and instead rely on "third-party formats" <-> GeoJSON/TopoJSON <-> Geobuf mapping. That would save us a lot of headache.

brendan-ward commented 9 years ago

I agree; I think the limitations of those formats is external to geobuf. Or put differently, only put into the geobuf what you can later take back out. Keeping Geobuf flexible, and the implementations of the data pipeline specific seems like the right way to go.

We too use those formats heavily in production, but I think we should rely on the existing tools for converting from them via GeoJSON / TopoJSON to Geobuf as @mourner suggests or build a specific implementation directly from the formats we need to Geobuf and cut out the middle data transformation (arguably there is a performance hit of going through GeoJSON if you don't need to).

The same issues are present roundtripping data from those formats through GeoJSON; it's more a matter of how you use the format and how you inspect data you didn't create yourself to determine compatibility.

artemp commented 9 years ago

geobuf.proto defines geometry types as

enum Type {
        POINT = 0;
        MULTIPOINT = 1;
        LINESTRING = 2;
        MULTILINESTRING = 3;
        POLYGON = 4;
        MULTIPOLYGON = 5;
        GEOMETRYCOLLECTION = 6;
    }

There is at least one established and widely used binary format for encoding geometries - WKB (Well-known binary) - http://portal.opengeospatial.org/files/?artifact_id=25355 which defines geometry types as


enum WKBGeometryType {
          wkbPoint=1,
          wkbLineString=2, 
          wkbPolygon=3,
          wkbMultiPoint =4, 
          wkbMultiLineString=5,
          wkbMultiPolygon =6, 
          wkbGeometryCollection=7
}

I think it would make sense to follow the same convention /cc @mourner

mourner commented 9 years ago

@artemp yes, agreed.

mourner commented 9 years ago

Stumbled upon an interesting ticket that mentioned #37 — https://github.com/opengeospatial/geopackage/issues/86 ("Shapefile Challenge"). I encourage everyone involved to take a look at it.

I think we can easily compete with others in the challenge if we solve the issues being discussed about the format (especially #37 and #46), and while competitive formats may be technically advanced, we can win with simplicity and elegance.

artemp commented 8 years ago

This looks relevant : https://github.com/TWKB/Specification/blob/master/twkb.md

/cc @mourner @joto