locationtech / geowave

GeoWave provides geospatial and temporal indexing on top of Accumulo, HBase, BigTable, Cassandra, Kudu, Redis, RocksDB, and DynamoDB.
Apache License 2.0
502 stars 190 forks source link

Create PDAL driver #14

Closed chrisbennight closed 9 years ago

chrisbennight commented 10 years ago

Write a plugin (read/write) PDAL that allows persistence and query of pointclouds in geowave.

See #13 - use the same technique chosen there (rpc vs. jni) to bridge the PDAL c++ interface with the java.

[1] https://github.com/PDAL/PDAL [2] http://www.pdal.io/docs.html [3] http://osgeo-org.1560.x6.nabble.com/pdal-Feedback-on-driver-development-td4680397.html

hobu commented 10 years ago

https://github.com/connormanning/GeoToolsProxy is a Thrift proxy that we used to develop a prototype to allow PDAL to talk to Accumulo when using GeoMesa's indexing. It might also be useful in this context for GeoWave.

rfecher commented 10 years ago

Thanks, I hadn't seen that project before but it looks like it would make life easy and straightforward. One concern we have had with using a thrift proxy is introducing yet another out of process proxy (eg. PDAL -> GeoWave/GeoToolsProxy -> Accumulo) in contrast to in process JNI invocations. We all know the pain that JNI can bring, particularly now that you've introduced us to this thrift proxy where a good part of the work would already be done for us, but the debate is which solution ends up being better in the long run once we get over that pain. I think we'll want to chew on it a bit and this is definitely worth consideration.

hobu commented 10 years ago

I'd also note that PDAL has been getting a significant makeover the past few months as well. An outcome of this effort is that it should be (mostly) thread safe and threadable, whereas support for that in the past was a bit uneven. I don't know how this adjusts the factors in your JNI decision(s).

With regard to #15, PDAL's chipper is likely to be something you will want to leverage for storing data http://www.pdal.io/stages/filters.chipper.html I would think it might be punishingly intensive to store each point individually (with associated index and record overhead per entry). The chipper seeks to produce "squarish" 2D blocks that reasonably match typical query windows for the data. It performs well by not recursing to absurdity. We're using the approach with both Oracle and pgpointcloud with much success.