lineage gets tracked using the below code:
_from pyspark.sql import SparkSession
import time
spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate()
df = spark.sql("select * from table limit 10")
df.write.mode("overwrite").saveAsTable("dummydb.test");
df.write.mode("overwrite").saveAsTable("dummydb.test");
lineage doesn't get tracked using the below code:
_from pyspark.sql import SparkSession
import time
spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate()
df = spark.sql("select * from table limit 10")
df.write.mode("overwrite").saveAsTable("dummydb.test");
The only difference here is the number of actions and position of actions.
This is my spark-submit command:spark-submit --files file:///atlas-application.properties --master yarn --deploy-mode cluster --jars file:///spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker a.py
lineage gets tracked using the below code: _from pyspark.sql import SparkSession import time spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate() df = spark.sql("select * from table limit 10") df.write.mode("overwrite").saveAsTable("dummydb.test"); df.write.mode("overwrite").saveAsTable("dummydb.test");
lineage doesn't get tracked using the below code: _from pyspark.sql import SparkSession import time spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate() df = spark.sql("select * from table limit 10") df.write.mode("overwrite").saveAsTable("dummydb.test");
The only difference here is the number of actions and position of actions. This is my spark-submit command: spark-submit --files file:///atlas-application.properties --master yarn --deploy-mode cluster --jars file:///spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker a.py