hortonworks-spark / spark-atlas-connector

A Spark Atlas connector to track data lineage in Apache Atlas
Apache License 2.0
263 stars 149 forks source link

Spark Atlas Connector not Capturing DDL Operations From spark-sql #290

Open har5havardhan opened 4 years ago

har5havardhan commented 4 years ago

hi,

I've setup a basic version of atlas and it works perfectly with hive and all the DDL operations and Lineage is being captured by atlas

changes made in hive-site.xml

hive.exec.post.hooks org.apache.atlas.hive.hook.HiveHook

But when I try to create a table using spark-sql or spark-shell the DDL commands and lineage are not captured by atlas.

please help me with what I am doing wrong.

I launch spark-sql using the below command

spark-sql --jars /home/hadoop/harsha/spark-atlas-connector/spark-atlas-connector-assembly/target/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker

wForget commented 4 years ago

I have made some modifications to get the lineage of spark SQL operation hive, which can be used as a reference, but I am not sure if there will be other problems. https://github.com/wForget/spark-atlas-connector/tree/dev-hive