OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Describe the bug
I recently created a schema containing a single table, and the ingestion process completed in less than 3 seconds. However, when I partition this table into 1000 partitions, the ingestion time increased significantly to more than 3 minutes, even though the partition tables themselves are not ingested into OpenMetadata. I am looking for clarification on why the ingestion pipeline slows down and how I can optimize it (perhaps by avoiding partition table checks).
To Reproduce
Create a partitioned table using the following SQL:
CREATE TABLE events (
event_id SERIAL,
event_name VARCHAR(100),
event_date DATE NOT NULL,
PRIMARY KEY (event_id, event_date)
) PARTITION BY RANGE (event_date);
Create partitions for the table (example for 1000 partitions):
DO $$
DECLARE
i INT;
BEGIN
FOR i IN 1..1000 LOOP
EXECUTE format('
CREATE TABLE events_partition_%s PARTITION OF events
FOR VALUES FROM (%L) TO (%L);', i, '2023-01-01'::DATE + (i - 1) * INTERVAL '1 day', '2023-01-01'::DATE + i * INTERVAL '1 day');
END LOOP;
END $$;
Ingest the schema into OpenMetadata and observe the time taken.
Repeat step 3 with 2000 partition tables and compare the ingestion times.
Observed Behavior
The ingestion time increases as the number of partitioned tables grows, despite the partition tables themselves not being ingested into OpenMetadata.
Expected behavior
Ingestion time should remain relatively consistent regardless of the number of partition tables if they are not being ingested.
I found that the problem that ingestion time inceased by partition number only when including DDL.
If I disable this option, the ingestion time is the same.
Can you check why including DDL would cause this issue?
Affected module Ingestion
Describe the bug I recently created a schema containing a single table, and the ingestion process completed in less than 3 seconds. However, when I partition this table into 1000 partitions, the ingestion time increased significantly to more than 3 minutes, even though the partition tables themselves are not ingested into OpenMetadata. I am looking for clarification on why the ingestion pipeline slows down and how I can optimize it (perhaps by avoiding partition table checks).
To Reproduce
Observed Behavior The ingestion time increases as the number of partitioned tables grows, despite the partition tables themselves not being ingested into OpenMetadata.
Expected behavior Ingestion time should remain relatively consistent regardless of the number of partition tables if they are not being ingested.
Version:
openmetadata-ingestion[docker]==XYZ
]Additional context Slack thread: https://openmetadata.slack.com/archives/C02B6955S4S/p1727689640694269