rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.39k stars 897 forks source link

[FEA] GPU-accelerated spatial and trajectory data management techniques #2093

Closed zhangjianting closed 4 years ago

zhangjianting commented 5 years ago

Is your feature request related to a problem? Please describe. Integrating GPU-accelerated spatial and trajectory data management techniques, including indexing, query processing and certain analytical functionality, into cudf. This will also enable cudf to process both relational and spatial/spatiotemporal data in an integrated way.

Describe the solution you'd like A set of GPU-accelerated spatial/temporal data management techniques have been developed outside of cudf, including (with more coming):

  1. spatial/temporal window (or range) query
  2. point-in-polygon test (vector of points and vector of multi-ring polygons)
  3. point-to-polyline shortest distance and nearest neighbor query
  4. quadtree-based point indexing
  5. computing length/speed of trajectories from a set of unordered point locations.

Experiments on real datasets have shown impressive speedups when tested as standalone programs. These modules can be integrated into cudf by extending cudf's type system to include one or more spatiotemporal data types, utilize rmm for more efficient GPU memory management, share python wrapping and dask distribution software infrastructure, and interoperate with other RAPDIS modules to bring the functionality to broader user communities.

Describe alternatives you've considered If defining a comprehensive spatiotemporal data type system is difficult, a placeholder data type can be defined and the data buffer can be dynamically casted to corresponding data type internally.

Additional context The motivated domain application is traffic surveillance where huge set of point location data are generated by detecting vehicles from a set of calibrated and synchronized cameras using deep learning algorithms. With a frame rate of ~10fps and dozens of vehicles or more in a single frame, when aligned with urban infrastructures especially road networks, these point locations can provide rich information for many applications. GPU-accelerated spatial and trajectory data management techniques are essential in processing such data that have significant volume, velocity and variety.

Christian8491 commented 5 years ago

In here, https://github.com/rapidsai/cudf/issues/1663 there are some GIS functions. Point-in-polygon test is already implemented (a vector of query points and a static polygon) using the GDF Dataframe. Here, https://github.com/rapidsai/cudf/blob/branch-0.9/cpp/include/cudf/gis.hpp contains the style of how will be working those GIS functions. A perimeter (or as length of paths) on the Earth surface is being developed.

zhangjianting commented 5 years ago

The new PIP module that I have implemented in cudf (currently on a local development box with a Titan V) supports vector of complex polygons with multiple rings (e.g., polygons with holes). Many real world polygon data, e.g., census blocks/tracts, are complex. Testing against a vector of polygons (e.g., candidates after spatial filtering in a spatial join query) in a single API call is likely to be more efficient than invoking the API repetitively, at least at the conceptual level. Experiments on testing 1.3 million points against 27 polygons from a traffic monitoring application took only 0.92 ms kernel time on a Titan V while using GDAL Geometry.contains(Geometry) API took about 87 s, i.e, ~10^5 speedup.

On the other hand, I think multiple modules implementing a function can coexist as they might be suitable for different applications. For example, the current PIP module requires only two gdf_columns (for x/y) while my module requires two additional gdf_columns for indexing the multi-polygon ---polygon ---ring structure. I am working on how to represent complex spatial/trajectory data structures in a column store framework.

@Christian8491 Are you aware of the application/performance of the PIP module (under GIS dir) currently in cudf?

Christian8491 commented 5 years ago

Hi @zhangjianting what you are doing is great. I was thinking adding more complex functionalities on the GIS functions with other releases. That idea of coexist is awesome. I have been working on some GIS functions focused in how some spatial (and geospatial) works and what they returns on the context of spatial SQL . But how you say all they might be suitable for different applications.

kkraus14 commented 4 years ago

Closing as we now have https://github.com/rapidsai/cuspatial :)