harsha2010 / magellan

Geo Spatial Data Analytics on Spark
Apache License 2.0
533 stars 149 forks source link

Incompatibility: Databricks Run Time Version 3.1+ on Spatial Join #153

Closed john-min closed 7 years ago

john-min commented 7 years ago

I am using Databricks Run Time Version 3.0+ with Spark 2.2, Scala 2.11 attempting to do a spatial join using the within predicate, i.e. point in polygon.

I add the spatial join rule to Spark following the documentation. Once I try to retrieve the dataframe with the results, I get the following error:

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.AttributeReference$.apply$default$7(Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;ZLorg/apache/spark/sql/types/Metadata;)Ljava/lang/Boolean;

    at magellan.catalyst.SpatialJoin.magellan$catalyst$SpatialJoin$$attr(SpatialJoin.scala:105)
    at magellan.catalyst.SpatialJoin$$anonfun$apply$1.applyOrElse(SpatialJoin.scala:72)
    at magellan.catalyst.SpatialJoin$$anonfun$apply$1.applyOrElse(SpatialJoin.scala:35)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:293)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:293)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
    at magellan.catalyst.SpatialJoin.apply(SpatialJoin.scala:35)
    at magellan.catalyst.SpatialJoin.apply(SpatialJoin.scala:26)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
    at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
    at scala.collection.immutable.List.foldLeft(List.scala:84)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:86)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:82)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:91)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:91)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2839)
    at org.apache.spark.sql.Dataset.count(Dataset.scala:2426)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:1)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:69)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:71)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:73)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:75)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:77)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:79)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:81)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:83)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-120205:85)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw$$iw.<init>(command-120205:87)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw$$iw.<init>(command-120205:89)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw$$iw.<init>(command-120205:91)
    at linec5d77947888f472bbacef4aa690e6df151.$read$$iw.<init>(command-120205:93)
    at linec5d77947888f472bbacef4aa690e6df151.$read.<init>(command-120205:95)
    at linec5d77947888f472bbacef4aa690e6df151.$read$.<init>(command-120205:99)
    at linec5d77947888f472bbacef4aa690e6df151.$read$.<clinit>(command-120205)
    at linec5d77947888f472bbacef4aa690e6df151.$eval$.$print$lzycompute(<notebook>:7)
    at linec5d77947888f472bbacef4aa690e6df151.$eval$.$print(<notebook>:6)
    at linec5d77947888f472bbacef4aa690e6df151.$eval.$print(<notebook>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
    at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:186)
    at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:184)
    at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:184)
    at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:184)
    at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:456)
    at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:410)
    at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:184)
    at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$3.apply(DriverLocal.scala:234)
    at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$3.apply(DriverLocal.scala:215)
    at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:188)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:183)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:39)
    at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:221)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:39)
    at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:215)
    at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
    at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
    at scala.util.Try$.apply(Try.scala:192)
    at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:584)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:488)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:348)
    at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:215)
    at java.lang.Thread.run(Thread.java:748)

My simplified code looks as follows, just following along with the documentation:


val pointData = sqlContext.read.parquet(pointPath)
val points = pointData.select(
    $"lat",
    $"lon",
    point($"lon", $"lat").as("point")
)

val polygons = sqlContext.read
    .format("magellan")
    .option("magellan.index", "true")
    .option("magellan.index.precision", "30")  
    .load(polygonPath)

magellan.Utils.injectRules(spark)

val joinDF = points.join(polygons).where($"point" within $"polygon")
harsha2010 commented 7 years ago

@john-min let me know if this is still an issue. closing it after testing on DB Runtime 3.1, 3.2 with #157

hectormauer commented 7 years ago

I am having the same error even though I'm running it in DB Runtime 3.2, with Spark 2.2.0 and Scala 2.11. I am basically trying to reproduce the code from your example in the blog: https://magellan.ghost.io/magellan-geospatial-processing-made-easy/

Error: NoSuchMethodException: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.AttributeReference$.apply$default$7(Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;ZLorg/apache/spark/sql/types/Metadata;)Ljava/lang/Boolean; at magellan.catalyst.SpatialJoin.magellan$catalyst$SpatialJoin$$attr(SpatialJoin.scala:105) at magellan.catalyst.SpatialJoin$$anonfun$apply$1.applyOrElse(SpatialJoin.scala:72) at magellan.catalyst.SpatialJoin$$anonfun$apply$1.applyOrElse(SpatialJoin.scala:35) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) ...

ganeshharugeri commented 6 years ago

@harsha2010 , A error.txt

any solution to this issue?