Closed zhangjianting closed 4 years ago
In here, https://github.com/rapidsai/cudf/issues/1663 there are some GIS functions. Point-in-polygon test is already implemented (a vector of query points and a static polygon) using the GDF Dataframe. Here, https://github.com/rapidsai/cudf/blob/branch-0.9/cpp/include/cudf/gis.hpp contains the style of how will be working those GIS functions. A perimeter (or as length of paths) on the Earth surface is being developed.
The new PIP module that I have implemented in cudf (currently on a local development box with a Titan V) supports vector of complex polygons with multiple rings (e.g., polygons with holes). Many real world polygon data, e.g., census blocks/tracts, are complex. Testing against a vector of polygons (e.g., candidates after spatial filtering in a spatial join query) in a single API call is likely to be more efficient than invoking the API repetitively, at least at the conceptual level. Experiments on testing 1.3 million points against 27 polygons from a traffic monitoring application took only 0.92 ms kernel time on a Titan V while using GDAL Geometry.contains(Geometry) API took about 87 s, i.e, ~10^5 speedup.
On the other hand, I think multiple modules implementing a function can coexist as they might be suitable for different applications. For example, the current PIP module requires only two gdf_columns (for x/y) while my module requires two additional gdf_columns for indexing the multi-polygon ---polygon ---ring structure. I am working on how to represent complex spatial/trajectory data structures in a column store framework.
@Christian8491 Are you aware of the application/performance of the PIP module (under GIS dir) currently in cudf?
Hi @zhangjianting what you are doing is great. I was thinking adding more complex functionalities on the GIS functions with other releases. That idea of coexist is awesome. I have been working on some GIS functions focused in how some spatial (and geospatial) works and what they returns on the context of spatial SQL . But how you say all they might be suitable for different applications.
Closing as we now have https://github.com/rapidsai/cuspatial :)
Is your feature request related to a problem? Please describe. Integrating GPU-accelerated spatial and trajectory data management techniques, including indexing, query processing and certain analytical functionality, into cudf. This will also enable cudf to process both relational and spatial/spatiotemporal data in an integrated way.
Describe the solution you'd like A set of GPU-accelerated spatial/temporal data management techniques have been developed outside of cudf, including (with more coming):
Experiments on real datasets have shown impressive speedups when tested as standalone programs. These modules can be integrated into cudf by extending cudf's type system to include one or more spatiotemporal data types, utilize rmm for more efficient GPU memory management, share python wrapping and dask distribution software infrastructure, and interoperate with other RAPDIS modules to bring the functionality to broader user communities.
Describe alternatives you've considered If defining a comprehensive spatiotemporal data type system is difficult, a placeholder data type can be defined and the data buffer can be dynamically casted to corresponding data type internally.
Additional context The motivated domain application is traffic surveillance where huge set of point location data are generated by detecting vehicles from a set of calibrated and synchronized cameras using deep learning algorithms. With a frame rate of ~10fps and dozens of vehicles or more in a single frame, when aligned with urban infrastructures especially road networks, these point locations can provide rich information for many applications. GPU-accelerated spatial and trajectory data management techniques are essential in processing such data that have significant volume, velocity and variety.