maropu / spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL
Apache License 2.0
83 stars 15 forks source link

Lineage Data can be directly output in JSON format? #3

Open melin opened 2 years ago

melin commented 2 years ago

Lineage Data can be directly output in JSON format? which is convenient to store in the graph database。

maropu commented 2 years ago

Ah, it might be worth doing it, but is there a common JSON format to represent a directed acyclic graph?

melin commented 2 years ago

The format is altas Lineage JSON: https://atlas.apache.org/api/v2/json_AtlasLineageInfo.html

maropu commented 2 years ago

INFO: I added an interface to define an output format for exporting a generated graph into other systems in https://github.com/maropu/spark-sql-flow-plugin/commit/abb1f0f2f9591d3f511948a57ae8fde0032e25ff: https://github.com/maropu/spark-sql-flow-plugin/blob/abb1f0f2f9591d3f511948a57ae8fde0032e25ff/src/test/scala/org/apache/spark/sql/SQLFlowSuite.scala#L175-L190