trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Parallelize tables retrieval from multiple catalogs #24159

Open piotrrzysko opened 3 days ago

piotrrzysko commented 3 days ago

Description

This PR implements parallelization of table retrieval at the catalog level by generating a split for each catalog.

Benchmark results for the following query (9 catalogs, 822 schemas, 17835 tables):

SELECT * FROM system.jdbc.tables;

image

The chart compares the execution time of the query above between master and the changes introduced in this PR and https://github.com/trinodb/trino/pull/24110.

Question

Since this PR requires changes to the SPI (io.trino.spi.connector.SystemTable), I'm wondering how to approach them. Should I introduce a new cursor method to the SystemTable interface? Currently, I have extended an existing one.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)