open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.23k stars 992 forks source link

fromColumns having list type in column lineage json even though it holds one value #11528

Closed Sai7656 closed 1 year ago

Sai7656 commented 1 year ago

Affected module Does it impact the UI, backend or Ingestion Framework? - Backend

Describe the bug A clear and concise description of what the bug is. - The "fromColumns" section in the column lineage json is having list of values as below even though it holds one value and for many to one column mapping as well all lineage appears as single entities.

"fromColumns": [ "AzureDatabricks_DataFabric.hive_metastore.humanresources.employee_department_history_silver.EndDate" ]

To Reproduce - NA

Screenshots or steps to reproduce

Expected behavior A clear and concise description of what you expected to happen. - As all column level lineage are one to one it's better to have this field as string just like "toColumn" and change the column name from "fromColumns" to "fromColumn".

Version:

Additional context Add any other context about the problem here.

sureshms commented 1 year ago

We are capturing the column level lineage on the relationship edge from one table to another. Since this edge can have multiple column lineage for multiple columns from table 1 to table 2, this is an array.

Closing this as not a bug. @Sai7656, it would be great if you can confirm that an issue is indeed a bug in OpenMetadata support channel before opening the bug. Thank you.

Sai7656 commented 1 year ago

We are capturing the column level lineage on the relationship edge from one table to another. Since this edge can have multiple column lineage for multiple columns from table 1 to table 2, this is an array.

Closing this as not a bug. @Sai7656, it would be great if you can confirm that an issue is indeed a bug in OpenMetadata support channel before opening the bug. Thank you.

Hi @sureshms I spoke with @ulixius9 and raised the issue here. https://openmetadata.slack.com/archives/C02B6955S4S/p1683719317889769

Also, I have a lineage in OM UI which is as shown in the picture. Here two columns(startdate and enddate) are used to populate a column(Totalworkingdays). But when I get the lineage through API fromColumns doesn't hold array of values. It holds two different blocks as below.

image

                {
                    "fromColumns": [
                        "AzureDatabricks_DataFabric.hive_metastore.humanresources.employee_department_history_silver.EndDate"
                    ],
                    "toColumn": "AzureDatabricks_DataFabric.hive_metastore.openmetadata_poc.employee_shift_vw.TotalWorkingDays"
                },
                {
                    "fromColumns": [
                        "AzureDatabricks_DataFabric.hive_metastore.humanresources.employee_department_history_silver.StartDate"
                    ],
                    "toColumn": "AzureDatabricks_DataFabric.hive_metastore.openmetadata_poc.employee_shift_vw.TotalWorkingDays"
                },
sureshms commented 1 year ago

@Sai7656 this is indeed a bug. Thank you for adding details. @ulixius9, in this case we should include both the StartDate and EndDate in fromColumns and TotalWorkingDays in the to column. We need to make sure UI shows it correctly with two lines starting from the upstream table and merging together before connecting to the destination table.

@Sai7656 and @ulixius9, We have also another case. Let's say table1 column1 is used in addition table2 column2 to create table3 column3. In this case:

  1. Edge between table1 and table3 has fromColumns table1.column1 and toColumn table2.column3.
  2. Edge between table2 and table3 has fromColumns table2.column2 and toColumn table2.column3
  3. UI has to show lines coming from table1 and table2 merging to gether before connecting to table3

Ping me on slack if this not clear.