harsha2010 / magellan

Geo Spatial Data Analytics on Spark
Apache License 2.0
534 stars 149 forks source link

Point not found when injectRules used #237

Open bdgeise opened 5 years ago

bdgeise commented 5 years ago
val spark = SparkSession.builder
    .appName("Testing Spark DSL")
    .master("local[1]") //build a local cluster
    .getOrCreate()

//  injectRules(spark)

  import spark.implicits._

  val data = Array(("US", "TX", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159),
    ("US", "PA", "2018-12-08 00:00:00", 12.0123, "ios", 183,32.813548, -96.835159),
    ("CA", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("GB", null, "2018-12-08 00:00:00", 12.0123, "ios", 2,32.813548, -96.835159),
    ("US", "NC", "2018-12-08 00:00:00", 12.0123, "android", 35,32.813548, -96.835159),
    ("US", "CA", "2018-12-08 00:00:00", 12.0123, null, 2,32.813548, -96.835159),
    ("A", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("US", "NY", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159))

  val df1 = spark.sparkContext.parallelize(data).toDF("country", "state", "location_at",
    "horizontal_accuracy", "platform", "app_id", "latitude", "longitude")
    .withColumn("location_at", col("location_at").cast(TimestampType))
  df1.show()
  println(df1.printSchema)

  val filterFilePath = path_to_geojson

  val filteringDS = spark.sqlContext.read.format("magellan")
    .option("magellan.index", "true")
    .option("magellan.index.precision", "15")
    .option("type", "geojson").load(filterFilePath)
    .cache()

  filteringDS.count()
  filteringDS.show(false)

  val filtered = df1
    .withColumn("locationPoint", point(col("longitude"), col("latitude")))
    .join(filteringDS)
    .where(col("locationPoint") within col("polygon"))

  filtered.show()

Using the example above, if I just injectRules I get 0 results. But if I don't use injectRules I get the proper results.

Also, to note, I've tried different levels of precision in the index but the same issue persisted when injecting the rules.

Geojson file used for testing attached. TX.geojson.txt

bdgeise commented 5 years ago

@harsha2010 - Any luck looking at this one?

bdgeise commented 5 years ago

I was able to do some more testing/debugging today. If I do a true cross join, and test for point within polygon using a withColumn, it returns true. However when I do it in the where I still get an empty dataframe in return while using inject rules. @harsha2010

bdgeise commented 5 years ago

Another update here...Seems to work ok with the master branch and Spark 2.3.2. Are you aware of any changes since the 1.05 release that I might be able to look at and test against? @harsha2010

harsha2010 commented 5 years ago

@bdgeise there is this bug I noticed and fixed a while back.. https://github.com/harsha2010/magellan/commit/aa9021eec14ccbdab4c90316ff9a7bf129873f8e

not sure if that is related. let me try this on 1.0.5 branch and check today