harsha2010 / magellan

Geo Spatial Data Analytics on Spark
Apache License 2.0
533 stars 149 forks source link

Issues Using Indexed Columns While Doing a Spatial Join #180

Closed mdbuck closed 6 years ago

mdbuck commented 6 years ago

Attached is a driver application demonstrating how something bad happens when trying to do a spatial join between a point DataFrame and a polygon DataFrame using the new indexing feature in Magellan 1.0.5:

magellan-wkt-within.zip

harsha2010 commented 6 years ago

how many nodes are you using? a single driver and no workers? how big is your driver node? and how much data are we talking about? (polygons and points)

mdbuck commented 6 years ago

The driver application is a command line application that starts up Spark with spark.master == local[1]

The data is small: the polygon table contains 5 rows with the largest polygon containing 9 nodes; the point table contains 8 rows.

I have simplified the driver application. Please see attached.

PolygonDriver2.zip

mdbuck commented 6 years ago

Any more news on this?

Thanks.