hortonworks-spark / spark-atlas-connector

A Spark Atlas connector to track data lineage in Apache Atlas
Apache License 2.0
263 stars 149 forks source link

Temporary tables stored in Atlas #258

Open whazor opened 5 years ago

whazor commented 5 years ago

Currently the Spark Atlas Connector reports temporary thrift tables to Atlas as spark_table. Below you can find an example lineage report. The questions we have about these temporary tables:

Example lineage reported:

createTime: 1553897909000
database: DBNAME
description: [empty]
lastAccessTime: 0
name: o_TABLENAME_xref_20190328
owner: [owner of task]
paritionColumnNames: [empty]
properties: transient_lastDdlTime: 1553897909, bucketing_version: 2
provider: parquet
qualifiedName: thrift://node1:9083,thrift://node2:9083,thrift://node3:9083.DBNAME.o_TABLENAME_xref_20190328
schema: [empty]
storage: thrift://node1:9083,thrift://node2:9083,thrift://node3:9083.DBNAME.o_TABLENAME_xref_20190328.storageFormat
tableType: MANAGED
unsupportedFeatures: [empty]
HeartSaVioR commented 5 years ago

Thanks for reporting! It would be pretty helpful if you contain step to reproduce too, as well as which branch/commit do you use to reproduce issue.

SAC is a kind of "moving one" and we haven't plan on official releases: so if it doesn't reproduce in current master, we may not address it to previous version/branch.

Thanks again!