Open datapopcorn opened 2 months ago
I found that the problem that ingestion time inceased by partition number only when including DDL. If I disable this option, the ingestion time is the same. Can you check why including DDL would cause this issue?
Affected module Ingestion
Describe the bug I recently created a schema containing a single table, and the ingestion process completed in less than 3 seconds. However, when I partition this table into 1000 partitions, the ingestion time increased significantly to more than 3 minutes, even though the partition tables themselves are not ingested into OpenMetadata. I am looking for clarification on why the ingestion pipeline slows down and how I can optimize it (perhaps by avoiding partition table checks).
To Reproduce
Observed Behavior The ingestion time increases as the number of partitioned tables grows, despite the partition tables themselves not being ingested into OpenMetadata.
Expected behavior Ingestion time should remain relatively consistent regardless of the number of partition tables if they are not being ingested.
Version:
openmetadata-ingestion[docker]==XYZ
]Additional context Slack thread: https://openmetadata.slack.com/archives/C02B6955S4S/p1727689640694269