maropu / spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL
Apache License 2.0
86 stars 15 forks source link

Supported Insert as select sql #5

Open melin opened 2 years ago

melin commented 2 years ago

example: insert into table bigdata.test_child_lineage select * from bigdata.test_child_dt where ds='20211207'

There is a lineage relationship between test_child_lineage and test_child_dt

maropu commented 2 years ago

Yea, but it is not easy to implement it because Spark does not hold the relationship in its catalog. Probably, we need a custom SQL listener to track INSERT query logs.

melin commented 2 years ago

Have a plan to finish? Thanks!

maropu commented 2 years ago

I don't have it because I don't have a smart idea for implementing it. Any plan to use the feature if it's implemented?

melin commented 2 years ago

Will use, most of the scheduling job that run every day are insert sql job

melin commented 2 years ago

https://zhuanlan.zhihu.com/p/502927331?utm_source=wechat_session&utm_medium=social&utm_oi=27427107504128&utm_campaign=shareopn&s_r=0 The article has description to solve this problem