open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.28k stars 997 forks source link

Update Lineage for updated Tables/Views #11302

Closed adarshpansari closed 2 months ago

adarshpansari commented 1 year ago

Problem: Currently, the lineage in the application does not get updated automatically when there is an update in the query. Due to this, the lineage from the old query and the new query gets appended, hence the lineage generated is not updated according to the change, leading to incorrect lineage. This can create confusion and issues in data management and analysis, and can also result in wrong decisions being made based on the data lineage.

Solution Requested: I would like to request a feature enhancement to address this issue. The proposed solution is to consider the latest query for the table or view when creating the lineage. This will ensure that the lineage is always up to date and reflects the current state of the data. This enhancement will significantly improve the accuracy and reliability on data lineage.

Alternative Solution: As a workaround, We can manually update the lineage to reflect the latest query. However, this is a time-consuming and error-prone process.

Issue Example - I updated an view with a new query previously I used 4 tables and after the update we are using 4 tables and with some tables changes from the previous query.

QUERY 1 create view openmetadata_poc.vw_ProductOrderInventory as select * from purchasing.purchase_order_header_silver a inner join purchasing.purchase_order_detail_silver b on a.PurchaseOrderID = b.PurchaseOrderID inner join production.product_inventory_silver c on b.productid = c.productid inner join production.location_silver d on c.locationid = d.locationid

QUERY 2 create view openmetadata_poc.vw_ProductOrderInventory as select * from production.product_silver a INNER JOIN production.product_subcategory_silver b on a.ProductSubcategoryID = b.ProductSubcategoryID INNER JOIN production.product_inventory_silver c on a.ProductID = c.ProductID INNER JOIN production.location_silver d on c.LocationID = d.LocationID

But the lineage generated was from combination of both the queries. Lineage

JahJajaka commented 1 year ago

This is very confusing behaviour. Voted for this issue :)

ulixius9 commented 2 months ago

closing by https://github.com/open-metadata/OpenMetadata/pull/17064