Closed john-bodley closed 1 year ago
@hashhar are you aware from where the performance difference comes from? It might be that query to information_schema.tables
do some kind of predicate pushdown due to table_type = 'VIEW'
?
@mdesmet thanks for the review. I've addressed your comments.
@hashhar : PTAL I think we should make this change based on your feedback on Slack
hashhar For views we call metadata.getViews in InformationSchemaPageSource - that methods ends up retrieving view definitions as well. For tables we call listTables + listViews (in InformationSchemaPageSource) - neither of which need view definitions. So actually the tables is faster since it just lists things.
Description
Querying the
information_schema.views
table seems to be sub-performant for determining which entities are views—I speculate this is because all the view definitions need to be extracted/compiled. A more performant approach is to useinformation_schema.tables
with the appropriate type filter.For example for a very large schema at Airbnb (comprising of over 100k entities) the following query,
took ~ 5 seconds, whereas:
was still running after 10 minutes.
Note there were no prior integration tests for the
TrinoDialect.get_view_names
method so I added some to cover this logic and threw in a few extra for free which should cover all the various scenarios.Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required. ( ) Release notes are required, please propose a release note for me. (x) Release notes are required, with the following suggested text: