Open jonathanjordy opened 5 months ago
@ulixius9 seems like this is related to:
https://github.com/open-metadata/OpenMetadata/issues/16424 https://github.com/open-metadata/OpenMetadata/pull/16876
@ulixius9 would it be possible to fix the []
(square brackets) parsing issue as advised by Teddy in #16876 ?
It seems like you have the best understanding of the cause for this one: https://github.com/open-metadata/OpenMetadata/issues/7427#issuecomment-2129407993
We had faced the same problem. Also we found out that commenting commands like "CREATE PROCEDURE", "BEGIN", "END", "TRUNCATE", "DECLARE", "AS" gives correct lineage using SQL LINEAGE tool (which OM use for parsing lineages as i got). Screenshots of the example stored procedure below. https://reata.github.io/sqllineage/ - sql lineage web demo
How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?
@bharathkumar78
How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?
May be this will help to understand what cause the problem? I do not suggest users to comment on their procedures. I propose to refine the stored procedure parser in such a way that it possibly excludes these lines before building lineage.
Affected module UI and Ingestion Framework
Describe the bug I have a stored procedure on my mssql database. I've created and run a metadata and lineage ingestion on the OM but the lineage between the database affected by the stored procedure does not shows up. The ingestion lineage logs does not show any available lineage to be ingested, however the stored procedure are succesfullly ingested into the OM.
Here are the screenshot of the lineage ingestion debug log:
[2024-06-20T10:23:52.634+0000] {ingestion_pipeline_mixin.py:52} DEBUG - Created Pipeline Status for pipeline XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667: {'id': '89ba7f25-e87e-45e7-9ccc-d9ee65f81168', 'eventType': 'entityUpdated', 'entityType': 'ingestionPipeline', 'entityId': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'entityFullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'previousVersion': 0.3, 'currentVersion': 0.3, 'userName': 'ingestion-bot', 'timestamp': 1718879032630, 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'pipelineStatus', 'oldValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'running', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'status': []}, 'newValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}}], 'fieldsDeleted': [], 'previousVersion': 0.3}, 'entity': {'id': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'name': 'c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'displayName': 'XXX-dummy-database_lineage_Cs1uNj9z', 'pipelineType': 'lineage', 'fullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'sourceConfig': {'config': {'type': 'DatabaseLineage', 'resultLimit': 1000, 'queryLogDuration': 1, 'parsingTimeoutLimit': 300}}, 'airflowConfig': {'pausePipeline': False, 'concurrency': 1, 'startDate': 1718755200000, 'pipelineTimezone': 'UTC', 'retries': 0, 'retryDelay': 300, 'pipelineCatchup': False, 'scheduleInterval': '0 7 * * 1', 'maxActiveRuns': 1, 'workflowDefaultView': 'tree', 'workflowDefaultViewOrientation': 'LR'}, 'service': {'id': 'a14d3a9d-2d37-44b9-951c-7e1774f778f1', 'type': 'databaseService', 'name': 'XXX-dummy-database', 'fullyQualifiedName': 'XXX-dummy-database', 'description': '', 'displayName': 'XXX-dummy-database', 'deleted': False}, 'pipelineStatuses': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}, 'loggerLevel': 'DEBUG', 'deployed': True, 'enabled': True, 'href': 'http://openmetadata-server:8585/api/v1/services/ingestionPipelines/440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'version': 0.3, 'updatedAt': 1718874878467, 'updatedBy': 'admin', 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'sourceConfig', 'oldValue': '"old-encrypted-value"', 'newValue': '"new-encrypted-value"'}, {'name': 'loggerLevel', 'oldValue': 'INFO', 'newValue': 'DEBUG'}], 'fieldsDeleted': [], 'previousVersion': 0.2}, 'deleted': False, 'provider': 'user'}} [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Statuses detailed info: [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Mssql Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.196479, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - OpenMetadata Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.3635197, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - Execution Time Summary [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Workflow Mssql Summary: [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Workflow OpenMetadata Summary: [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Success %: 100.0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Workflow finished in time: 3.45s [2024-06-20T10:23:52.642+0000] {python.py:194} INFO - Done. Returned value was: None [2024-06-20T10:23:52.652+0000] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=c26baaaa-aa4e-4e1f-b28b-b613d8c5a667, task_id=lineage_task, execution_date=20240620T102347, start_date=20240620T102348, end_date=20240620T102352 [2024-06-20T10:23:52.700+0000] {local_task_job_runner.py:228} INFO - Task exited with return code 0 [2024-06-20T10:23:52.721+0000] {taskinstance.py:2778} INFO - 0 downstream tasks scheduled from follow-on schedule check
To Reproduce
Expected behavior There should be a data lineage from table A to stored procedure X to table B.
Version:
Additional context Add any other context about the problem here.