prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.97k stars 5.35k forks source link

[Design] multiple clusters and dynamic topic discovery support on Kafka Connector #15845

Open yangy0000 opened 3 years ago

yangy0000 commented 3 years ago

Today, Presto Kafka connector only supports a single Kafka cluster and the static list of tables/schemas.

This approach enables:

Background

At Uber, we run Kafka clusters at scale, with over 2 dozen Kafka clusters and tens of thousands of topics. The proposal changes the implementation of the existing Kafka Connector to better suit the large-scale Kafka setup.

Overview

For backward compatibility, the existing static configuration will be supported and will be the default method, users need to define configuration cluster-description-supplier=DYNAMIC to enable dynamic multiple cluster support and table-description-supplier=DYNAMIC to enable dynamic table descriptor support. The dynamic supplier will populate metadata from supplier dir periodically instead of one-off from the connector initialization phase

Query syntax

To support multiple Kafka clusters, each Kafka cluster will have an alias(clusterName), and we use presto schemaName to specify Kafka cluster name in sql query syntax. For example, the sql query for cluster:foo on topic:bar will be

select * from kafka.foo.bar

Implementation

image

dik111 commented 3 years ago

I also need this feature,what is the following plans for this feature?