Closed joegoldbeck closed 1 year ago
hi @joegoldbeck can you test the same against 1.1.7? We have been doing many performance improvements there. Let us know if the issue persists
cc @harshach
Hi @pmbrull thanks for checking in! I tested again with 1.2.0, and have the same behavior unfortunately.
The # of database calls scales linearly with # columns and # tags, and so for wide tables (our largest has 1500 columns), there are a very large number of db calls. This performance seems slow-but-tolerable when running locally in docker, but in a deployed environment, the roundtrip network latency to the database adds up quickly.
@joegoldbeck we will be fixing this soon.
Fixed with this PR https://github.com/open-metadata/OpenMetadata/pull/13819
Affected module Backend
Describe the bug When a table has many labeled columns, requests to add a new column label begins to time out on the backend. In fact, even description updates begin to timeout.
Additionally, the request to GET the table becomes long (many seconds).
Effectively, tables with 100s of column labels are currently unsupported. This is an issue for us because a core use case is survey data storing 100s of questions, each with individual columns, with labels to add in categorization and retrieval.
To Reproduce
Create a table with 1000 columns.
Iterate through these columns, adding a label to each one, one at a time
Requests will take longer and longer, and eventually begin timing out
If you GET the table including
tags
in thefields
it will be slowIf you GET the table without including
tags
it will be normalExpected behavior
Adding tags to columns should not significantly impact performance of GETing or modifying a table, other than a small database performance penalty
Version:
3.9.17
1.1.5
openmetadata-ingestion==1.1.5
Additional context
I believe this happens because each tag is fetched from the database individually multiple times, requiring hundreds of database calls just to get the table, and again to do the write.
For example, each tag description is retrieved independently and serially https://github.com/open-metadata/OpenMetadata/blob/02094179e6ae92abba919f9bc888d57f6389421e/openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java#L2160-L2168
I tested GETing the table with and without tags in the
fields=
and it is specificallytags
that causes the backend to be super slow (4s just for GET table).I also did some query counting with smaller tables, with the following results
The most immediate solution would be to fetch all the tags, and all of their related information, in a single db call.
Relates to #12373