harsha2010 / magellan

Geo Spatial Data Analytics on Spark
Apache License 2.0
533 stars 149 forks source link

usage of Spatial joins #182

Closed khajaasmath786 closed 6 years ago

khajaasmath786 commented 6 years ago

Hi,

I am bit confused with the posts on where to use the index factor on the data frames.

  1. while creating dataframe for polygons of GEOJSON files do I need to follow approach of using below method or just adding index.

spark.read.format("magellan") .option("magellan.index", "true") .option("magellan.index.precision", "25") .load(s"$path")

OR PolygonDataframe.index(30) after loading the dataframe from GEOJSON file

  1. points.join(polygons).where($"point" within $"polygon") // or
    points.join(polygons index 30).where($"point" within $"polygon") do I need to still add index 30 as above after indicating it step 1 at the time of loading ?

    can I simply use points.join(polygons).where($"point" within $"polygon") after loading initial dataframe asPolygonDataframe.index(30) ? will it still consider indexs in this case?

harsha2010 commented 6 years ago

The simplest way to hint usage of indices is to A) either create a dataframe using df.index(precision) or B) directly specify in the join as x.join(y index precision)

In both A and B the index will be created on the fly for the join