visgl / loaders.gl

Loaders for big data visualization. Website:
https://loaders.gl
Other
695 stars 190 forks source link

Align `gis/geojson-to-binary` and `mvt/features-to-binary` #1984

Closed felixpalmer closed 2 years ago

felixpalmer commented 2 years ago

Background

The MVT module contains a function, featuresToBinary, which is used to efficiently convert a collection of features from a GeoJSON-like format into the final output binary format (e.g. used in deck.gl MVTLayer & GeoJsonLayer.

Originally this functionality was implemented using the geojsonToBinary, but this required the MVTLoader to first transform the data from the MVT protobuf format into GeoJSON and then into binary. As the memory layout of the protobuf is close to that of the final output binary, featuresToBinary skips GeoJSON and instead uses an intermediate Flat GeoJSON format.

The Flat GeoJSON format is basically GeoJSON, except that the coordinate data is stored in a flat array and a series of indices (stored in the lines array) is used to define individual geometries. Property storage is unchanged.

For example a Polygon geometry is transformed like so:

{
  "type": "Polygon",
  "coordinates": [
    [
      [ 13, 56 ], [ -12, 47 ], [ -15, 33 ], [ -2, 19 ], [ 21, 16 ], [ 36, 30 ], [ 39, 49 ], [ 13, 56 ]
    ],
    [
      [ 13, 49 ], [ 28, 41 ], [ 16, 27 ], [ 2, 33 ], [ 1, 43 ], [ 13, 49 ]
    ]
  ],
  "data": [ 13, 56, -12, 47, -15, 33, -2, 19, 21, 16, 36, 30, 39, 49, 13, 56, 13, 49, 28, 41, 16, 27, 2, 33, 1, 43, 13, 49 ],
  "lines": [[ 0, 16 ]]
}

Proposed approach

Currently there is a lot of duplication of code between featuresToBinary and geojsonToBinary. In order to share code I propose to:

If there are any proposals for a better name for the intermediate format, I'm happy to change it.

The external API will remain the same and as such there will be no breaking changes to the MVTLoader or geojsonToBinary

Task list

ibgreen commented 2 years ago

@felixpalmer thanks for the writeup. This seems like a very reasonable direction.

Just curious where will end up in terms of GeoJSON variants we support?

Maybe we need to write some documentation about that. Also clarify what is the relation between these and GeoArrow?

@kylebarron would love to hear your thoughts on this.

felixpalmer commented 2 years ago

My idea is that Flat GeoJSON is just an internal format to make the transformation to binary data easier. I wasn't anticipating on exposing it as a supported format. As for Binary GeoJSON: by this, do you mean the output of geojsonToBinary? As this isn't really GeoJSON anymore, but rather a format that maps closely to the attributes that deck.gl expects to upload to the GPU

ibgreen commented 2 years ago

Well it is a binary format that contains the same type of data that geojson does and it is getting pretty close to e.g. https://github.com/geopandas/geo-arrow-spec

kylebarron commented 2 years ago

Yes I definitely agree that our binary data handling could be much improved.

GeoJSON variants we support

One added complexity of our geojson-to-binary code is that we try to support both Geometry and Feature objects. For example, the WKBLoader exports Geometry objects because the WKB format describes only geometries with no feature properties. It would be nice to only support Feature objects, but that would require extra handling in the case of WKB for example.

Do we have detailed types yet for our binary geojson format(s)? I think that would be a helpful place to start if it doesn't exist yet.

getting pretty close to e.g. geopandas/geo-arrow-spec

Note that geo-arrow-spec in its current provisional state stores geometries in WKB format as a byte array. Storing geometries in an Arrow-native format is still under discussion https://github.com/geopandas/geo-arrow-spec/issues/4, https://github.com/geopandas/geo-arrow-spec/pull/12. But I think there's a lot of overlap between the ideas of "flat GeoJSON/binary GeoJSON" and the Arrow-native geometry format proposals.

ibgreen commented 2 years ago

@felixpalmer you can of course proceed with this as an internal format, and we can keep thinking about which binary friendly variants we want to expose.