oneonestar commented 1 month ago

Trino Gateway Routing Framework

Summary

Introduce Trino Gateway Routing Framework that simplifies the extension of the routing function, facilitates testing and verification. This allows routing logics to be implemented as plugins.

Motivation

The routing logic required can vary significantly depending on the use case. The types of workloads that need to be handled and the methods for operating Trino clusters differ for each user.

Although the gateway supports some level of customization of routing logic using MVEL, Java code may be preferred due to its compile-time checking and ease of testing.

Additionally, the current default routing logic is hard-coded, making it difficult to understand or modify according to specific needs. Modularizing the default routing logic would facilitate better testing and customization.

Goals

Extensible - Define a set of APIs that allow routing logics to be implemented as plugins.
Testable - Each routing logic implemented as plugins can be tested individually. The framework should provide the necessary tools for testing.
Stackable - Multiple plugins can be chained together to accommodate various needs. Each plugin decides either to make a routing decision or to pass it to the next plugin. Framework provides a mechanism allow data to be passed between plugins.
Configurable - Allow each plugin to be configured individually. The execution order of the plugins are also adjustable.
Efficient - Use lazy evaluation to avoid heavy computation when a routing decision can be made without it.
Logging - The framework provides essential logging and tracing, enabling unified logging and configuration.

Non-goals

Remove or replace MVEL (currently supported by using Easy Rules). Defining routing logics using MVEL will continue to be supported.

Proposal

Architecture

The routing framework defines a set of APIs that allow routing logics to be implemented as plugins. These plugins can be chained in different order according to the configuration. Each plugin decides either to make a routing decision or to pass it to the next plugin.

Information can be passed from earlier plugins to later ones, allowing them to work together to form complex logic.

Lazy evaluation

Different plugin requires different information. Most of the information can be derived from the request. Lazy evaluation will be used to minimize the computation cost. For example, Route by queryID plugin needs a query ID to make a decision. The plugin try to get a query ID, which trigger a computation to try to extract query ID from the request. The computed result could be stored for other plugins to use later.

(Below are WIP & random ideas)

Example 1: X-Trino-Routing-Group

Input:
Request: SELECT 1, X-Trino-Routing-Group: adhoc
ClusterInfo:
Trino1: Routing group: "batch", healthy
Trino2: Routing group: "batch", healthy
Trino3: Routing group: "adhoc", healthy
Trino4: Routing group: "adhoc", unhealthy
Trino5: Routing group: "adhoc", healthy

Plugin 1: Filter unhealthy clusters
=> Decision: undecided
=> Remaining cluster: {Trino1, Trino2, Trino3, Trino5}

Plugin 2: Route by X-Trino-Routing-Group
=> Decision: undecided
=> Remaining cluster: {Trino3, Trino5}

Plugin 3: Route by User
=> Decision: undecided
=> Remaining cluster: {Trino3, Trino5}

Plugin 4: Random Pick one
=> Decision: decided
=> Remaining cluster: {Trino3}

Example 2: Route by User

Input:
Request: SELECT 1, X-Trino-User: batch_job_account
ClusterInfo:
Trino1: Routing group: "batch", healthy
Trino2: Routing group: "batch", healthy
Trino3: Routing group: "adhoc", healthy
Trino4: Routing group: "adhoc", unhealthy
Trino5: Routing group: "adhoc", healthy

Plugin 1: Filter unhealthy clusters
=> Decision: undecided
=> Remaining cluster: {Trino1, Trino2, Trino3, Trino5}

Plugin 2: Route by X-Trino-Routing-Group
=> Decision: undecided
=> Remaining cluster: {Trino1, Trino2, Trino3, Trino5}

Plugin 3: Route by User
[with config file: batch_job_account=>batch, adhoc_job_account=>adhoc]
=> Decision: undecided
=> Remaining cluster: {Trino1, Trino2}

Plugin 4: Random Pick one
=> Decision: decided
=> Remaining cluster: {Trino3}

Example 3: Load aware routing

Input:
Request: SELECT 1, X-Trino-User: batch_job_account, 
ClusterInfo:
Trino1: Routing group: "batch", healthy, 10 Running Query
Trino2: Routing group: "batch", healthy, 5 Running Query

(Skip some plugins)...

Plugin: Route by least load
=> Decision: undecided
=> Remaining cluster: {Trino2}

Other possible plugins:

ML-based routing
Route by query Type/table used/catalog used (using result from query parsing)
Route by header (X-Trino-XXXXX)
Route by cookie
Route by query ID
Route by RPC result from some external system
A/B test (20% query to clusterA, )

Other random thoughts: ClusterStatsMonitor could grep more info from Trino, which enables more routing plugin to be implemented. eg.

Catalog available in each Trino
Worker count

Label based routing: Higher flexibility than the current routing group logic. Trino1: label={routing_group: batch, version: 440, env: prod} Trino2: label={routing_group: batch, version: 450, env: staging} Trino3: label={routing_group: adhoc, version: 440, env: test} Trino4: label={routing_group: admin, version: 440, env: prod}

Weight based routing: Weight cluster and distribute queries according to the weights. Weight cluster and select the highest weight cluster.

Other things that might need to be aware of:

Query retry in other cluster
Query result caching

Ref:

willmostly commented 1 month ago

simplifies the extension of the routing function, facilitates testing and verification

This is needed, thank you!

The overall philosophy looks good to me. I like that the current architecture is a specialization of the more general architecture here, so that we can provide built in plugins to current users and avoid breaking changes.

The label concept is appealing, and should allow more sophisticated routing decisions. Have you considered taking a bit more inspiration from kubernetes and using taints? For example you could taint a cluster like taints: ['oneMinuteMaxExecution'], and a routing plugin could either decide that that select 1 can tolerate this, or the client could add a header like X-Trino-Gateway-Toleration: oneMinuteMaxExecution. Otoh, maybe labels are sufficiently powerful.

Chaho12 commented 1 month ago

How about a feature/requirement section that includes the following info.

Features

Block routing to backend on failure expected queries.
- e.g) A repetition of failing query due to syntax/authorization error could be blocked in order to prevent unncessary connections from trino server to hive metastore/HDFS ranger in our case.
- I know trino relies on connector's authentication, but we noticed that IDE or superset etc. executes lots of queries at once and creates lots of connections, which is not welcome to our hive metastore.
- Also, we are planning to run explain and get statistics, and if query fails due to syntax error, there is no need to pass failure expected query onto backend.

xkrogen commented 1 day ago

Generally big +1 on this. I love the idea of modularizing the router policies and allowing them to stack with different precedence ordering.

Adding a couple of things for us to consider with this design:

In this design there is a single ClusterStatsMonitor which fetches cluster-level info. What if someone is writing a new plugin and wants to use some cluster info that is not currently available in ClusterStats? Do the plugin authors have to get changes merged to ClusterStatsMonitor first, then consume it? Or do we provide some mechanism for a routing plugin to request arbitrary information as part of ClusterStatsMonitor? The latter may be very useful for extensibility. I can imagine a mechanism whereby a plugin can provide e.g. a Function<ProxyBackendConfiguration, Map<String, String>> that can populate arbitrary key-value pairs of information.
The state sharing between plugins (e.g. for query ID, cookie, query analysis results) is interesting. Similar to my comment in (1), would we just allow for arbitrary key-value pairs, and would the values be String or Object? Or, would we provide structured support for certain types of state?
A common theme in both of my questions is where we land on the spectrum of making the plugin framework as generic/flexible as possible, which gives plugin authors the greatest flexibility while also requiring them to potentially do more work on their own; vs providing more structured supporting infrastructure that lives outside of the plugins, making them easier to write but also less generic. Perhaps a compromise can be had where common things are supported by the shared infra, but genericism is possible when needed.

trinodb / trino-gateway