hortonworks-spark / spark-atlas-connector

A Spark Atlas connector to track data lineage in Apache Atlas
Apache License 2.0
263 stars 149 forks source link

How could we make this connector work with HDP distributions with Spark 2.3.0? #307

Open nicolaszhang opened 3 years ago

nicolaszhang commented 3 years ago

We tried to build the master branch and use the connector in Spark 2.3.0 in HDP distribution, but got error like below:

Exception in thread "SparkExecutionPlanProcessor-thread" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener

    at com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$getPlanInfo(CommandsHarvester.scala:213)

    at com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$makeProcessEntities(CommandsHarvester.scala:222)

    at com.hortonworks.spark.atlas.sql.CommandsHarvester$InsertIntoHadoopFsRelationHarvester$.harvest(CommandsHarvester.scala:73)

    at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:130)

    at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)

    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)

    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)

    at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)

    at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)

    at

And when we tried to rebuild the Spark Atlas connector by changing pom.xml with Spark 2.3.0 dependency, there is build error. How could we make it work with HDP Spark distribution?