locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

Expose Partitioners in the API #568

Closed jbouffard closed 6 years ago

jbouffard commented 6 years ago

This PR exposes a choice of a Partitioner when performing operations in GPS. The two that have been added are: HashPartitioner and SpatialPartitioner, which was created by @echeipesh. As of this PR, the methods where the Partitioners are exposed in are: repartition, merge, pyramid, and rasterize.

Note: The way to add custom Partitioners in PySpark is by passing in partitionFunc as a parameter instead of a Partition instance (see this as an example). Because of this, there might not be a way in which we can preserve the same partitioning strategy a user set when working with the layer in Scala.

jbouffard commented 6 years ago

@jamesmcclain We never have supported Python partitioners. The main reason being that PySpark doesn't have them. Instead, the user passes in a partitioning strategy function to certain methods. I'm not really sure how we could support those at the moment, though.