open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.6k stars 1.05k forks source link

Data Lineage Not Reflected for MSSQL Stored Procedure #16737

Open jonathanjordy opened 5 months ago

jonathanjordy commented 5 months ago

Affected module UI and Ingestion Framework

Describe the bug I have a stored procedure on my mssql database. I've created and run a metadata and lineage ingestion on the OM but the lineage between the database affected by the stored procedure does not shows up. The ingestion lineage logs does not show any available lineage to be ingested, however the stored procedure are succesfullly ingested into the OM.

Here are the screenshot of the lineage ingestion debug log: [2024-06-20T10:23:52.634+0000] {ingestion_pipeline_mixin.py:52} DEBUG - Created Pipeline Status for pipeline XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667: {'id': '89ba7f25-e87e-45e7-9ccc-d9ee65f81168', 'eventType': 'entityUpdated', 'entityType': 'ingestionPipeline', 'entityId': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'entityFullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'previousVersion': 0.3, 'currentVersion': 0.3, 'userName': 'ingestion-bot', 'timestamp': 1718879032630, 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'pipelineStatus', 'oldValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'running', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'status': []}, 'newValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}}], 'fieldsDeleted': [], 'previousVersion': 0.3}, 'entity': {'id': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'name': 'c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'displayName': 'XXX-dummy-database_lineage_Cs1uNj9z', 'pipelineType': 'lineage', 'fullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'sourceConfig': {'config': {'type': 'DatabaseLineage', 'resultLimit': 1000, 'queryLogDuration': 1, 'parsingTimeoutLimit': 300}}, 'airflowConfig': {'pausePipeline': False, 'concurrency': 1, 'startDate': 1718755200000, 'pipelineTimezone': 'UTC', 'retries': 0, 'retryDelay': 300, 'pipelineCatchup': False, 'scheduleInterval': '0 7 * * 1', 'maxActiveRuns': 1, 'workflowDefaultView': 'tree', 'workflowDefaultViewOrientation': 'LR'}, 'service': {'id': 'a14d3a9d-2d37-44b9-951c-7e1774f778f1', 'type': 'databaseService', 'name': 'XXX-dummy-database', 'fullyQualifiedName': 'XXX-dummy-database', 'description': '', 'displayName': 'XXX-dummy-database', 'deleted': False}, 'pipelineStatuses': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}, 'loggerLevel': 'DEBUG', 'deployed': True, 'enabled': True, 'href': 'http://openmetadata-server:8585/api/v1/services/ingestionPipelines/440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'version': 0.3, 'updatedAt': 1718874878467, 'updatedBy': 'admin', 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'sourceConfig', 'oldValue': '"old-encrypted-value"', 'newValue': '"new-encrypted-value"'}, {'name': 'loggerLevel', 'oldValue': 'INFO', 'newValue': 'DEBUG'}], 'fieldsDeleted': [], 'previousVersion': 0.2}, 'deleted': False, 'provider': 'user'}} [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Statuses detailed info: [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Mssql Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.196479, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - OpenMetadata Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.3635197, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - Execution Time Summary [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Workflow Mssql Summary: [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Workflow OpenMetadata Summary: [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Success %: 100.0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Workflow finished in time: 3.45s [2024-06-20T10:23:52.642+0000] {python.py:194} INFO - Done. Returned value was: None [2024-06-20T10:23:52.652+0000] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=c26baaaa-aa4e-4e1f-b28b-b613d8c5a667, task_id=lineage_task, execution_date=20240620T102347, start_date=20240620T102348, end_date=20240620T102352 [2024-06-20T10:23:52.700+0000] {local_task_job_runner.py:228} INFO - Task exited with return code 0 [2024-06-20T10:23:52.721+0000] {taskinstance.py:2778} INFO - 0 downstream tasks scheduled from follow-on schedule check

To Reproduce

  1. Create a new database service with MSSQL connector
  2. Create a metadata ingestion
  3. Create a lineage ingestion
  4. The stored procedure from the MSSQL will be successfully ingested
  5. The data lineage from the MSSQL will not show up anywhere.

Expected behavior There should be a data lineage from table A to stored procedure X to table B.

Version:

Additional context Add any other context about the problem here.

k3agan commented 4 months ago

@ulixius9 seems like this is related to:

https://github.com/open-metadata/OpenMetadata/issues/16424 https://github.com/open-metadata/OpenMetadata/pull/16876

k3agan commented 3 months ago

@ulixius9 would it be possible to fix the [] (square brackets) parsing issue as advised by Teddy in #16876 ?

It seems like you have the best understanding of the cause for this one: https://github.com/open-metadata/OpenMetadata/issues/7427#issuecomment-2129407993

RabochiyYan commented 3 months ago

We had faced the same problem. Also we found out that commenting commands like "CREATE PROCEDURE", "BEGIN", "END", "TRUNCATE", "DECLARE", "AS" gives correct lineage using SQL LINEAGE tool (which OM use for parsing lineages as i got). Screenshots of the example stored procedure below. https://reata.github.io/sqllineage/ - sql lineage web demo image (1) image

bharathkumar78 commented 3 months ago

RabochiyYan

How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?

RabochiyYan commented 2 months ago

@bharathkumar78

How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?

May be this will help to understand what cause the problem? I do not suggest users to comment on their procedures. I propose to refine the stored procedure parser in such a way that it possibly excludes these lines before building lineage.