hortonworks-spark / spark-atlas-connector

A Spark Atlas connector to track data lineage in Apache Atlas
Apache License 2.0
263 stars 149 forks source link

Creates entity for each file from a source directory is it normal behaviour? #291

Open alexey-artemov opened 4 years ago

alexey-artemov commented 4 years ago

I used SAC for select -> inser stament at Spark, like:

val df = spark.read.parquet(source_path)
df.write.parquet(target_path)

as a result at the Atlas I have a tons of hdfs_path entities (for each file from "source_path"- this is folder) and just one entity for "target_path"(just folder). Is it normal behaviour?

Spark version 2.4.4 Scala version 2.11.12 Atlas: Version : 1.1.0.3.1.0.0-78 SAC: spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar