locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

Support for spatial joins #701

Open bflammers opened 5 years ago

bflammers commented 5 years ago

Hi there,

I have searched the docs on how to do simple spatial operations between geometries such as checking whether a point falls within a polygon. For my use case, I want to do this for a large collection of points and a large collection of polygons. In other Geo libraries, this is sometimes referred to as a spatial join.

Unfortunately, I have not been able to find anything on the simple operations as well as on the spatial joins. Based on a quick read of the GeoTrellis documentation, it seems that these things are supported in the scala library.

I believe this implies one of the following: 1) GeoPySpark is a limited interface to GeoTrellis 2) The GeoPySpark docs are not complete 3) I have missed the relevant sections in the docs completely

In case of 1): Will this functionality be added in the future? In case of 2): Will the documentation be updated in the future? In case of 3): Could you please point me in the right direction?

Thanks

jbouffard commented 5 years ago

@bflammers I am very sorry for just responding to your issue now. I somehow missed being notified about it.

Case 1 is correct. Vector operations are supported in GeoTrellis but not in GeoPySpark. This is because that while GeoPySpark is a Python binding of GeoTrellis, it fills a slightly different niche in the Python ecosystem than GeoTrellis does in Scala. The Python community already has various Vector libraries (shapely, fiona, etc), so the focus of GeoPySpark is mainly processing, formatting, and analyzing large amounts of raster data at scale.

So to answer your question: operations like spatial joins for Vectors will probably not be supported in GeoPySpark. However, if there's need for Vector processing at scale in Python, then that's something we may end up implementing.

bflammers commented 5 years ago

@jbouffard Thank you for your answer.

I think there is a need for this. I have been searching for a library that allows to perform spatial joins on Vectors using PySpark for some time, but there is no such thing at the moment. Please correct me if I am wrong! And I am not the only one looking for this: link. Would be great if it would be implemented