Closed felixpalmer closed 2 years ago
@felixpalmer thanks for the writeup. This seems like a very reasonable direction.
Just curious where will end up in terms of GeoJSON variants we support?
Maybe we need to write some documentation about that. Also clarify what is the relation between these and GeoArrow?
@kylebarron would love to hear your thoughts on this.
My idea is that Flat GeoJSON
is just an internal format to make the transformation to binary data easier. I wasn't anticipating on exposing it as a supported format. As for Binary GeoJSON
: by this, do you mean the output of geojsonToBinary
? As this isn't really GeoJSON anymore, but rather a format that maps closely to the attributes that deck.gl expects to upload to the GPU
Well it is a binary format that contains the same type of data that geojson does and it is getting pretty close to e.g. https://github.com/geopandas/geo-arrow-spec
Yes I definitely agree that our binary data handling could be much improved.
GeoJSON variants we support
One added complexity of our geojson-to-binary code is that we try to support both Geometry and Feature objects. For example, the WKBLoader exports Geometry objects because the WKB format describes only geometries with no feature properties. It would be nice to only support Feature objects, but that would require extra handling in the case of WKB for example.
Do we have detailed types yet for our binary geojson format(s)? I think that would be a helpful place to start if it doesn't exist yet.
getting pretty close to e.g. geopandas/geo-arrow-spec
Note that geo-arrow-spec in its current provisional state stores geometries in WKB format as a byte array. Storing geometries in an Arrow-native format is still under discussion https://github.com/geopandas/geo-arrow-spec/issues/4, https://github.com/geopandas/geo-arrow-spec/pull/12. But I think there's a lot of overlap between the ideas of "flat GeoJSON/binary GeoJSON" and the Arrow-native geometry format proposals.
@felixpalmer you can of course proceed with this as an internal format, and we can keep thinking about which binary friendly variants we want to expose.
Background
The MVT module contains a function,
featuresToBinary
, which is used to efficiently convert a collection of features from a GeoJSON-like format into the final output binary format (e.g. used in deck.glMVTLayer
&GeoJsonLayer
.Originally this functionality was implemented using the
geojsonToBinary
, but this required theMVTLoader
to first transform the data from the MVT protobuf format intoGeoJSON
and then into binary. As the memory layout of the protobuf is close to that of the final output binary,featuresToBinary
skipsGeoJSON
and instead uses an intermediateFlat GeoJSON
format.The
Flat GeoJSON
format is basicallyGeoJSON
, except that the coordinate data is stored in a flat array and a series of indices (stored in thelines
array) is used to define individual geometries. Property storage is unchanged.For example a Polygon geometry is transformed like so:
Proposed approach
Currently there is a lot of duplication of code between
featuresToBinary
andgeojsonToBinary
. In order to share code I propose to:geojsonToFlatGeojson
functionfeaturesToBinary
toflatGeojsonToBinary
(the current naming is confusing)geojsonToBinary
and effectively replace it withflatGeojsonToBinary(geojsonToFlatGeojson(...))
flatGeojsonToBinary
function to the gis package so all the conversion methods are in the same placeIf there are any proposals for a better name for the intermediate format, I'm happy to change it.
The external API will remain the same and as such there will be no breaking changes to the
MVTLoader
orgeojsonToBinary
Task list
geojsonToFlatGeojson
geojsonToFlatGeojson
testsmvt/featuresToBinary
togis/flatGeojsonToBinary
geojsonToBinary
to usegeojsonToFlatGeojson
&flatGeojsonToBinary