Open alaturqua opened 3 months ago
What do you mean by "handle busy clusters gracefully" .. there is no queue or so in Trino Gateway .. it just routes traffic to clusters. In this case if adhoc is busy and no other cluster is available for routing.. what should the Trino Gateway do?
Hmm. i don't think we need a queue in Trino Gateway for now. I shared in slack once, but i think we should improve the way gateway handles how we return routing failure (due to whatever reason).
As of now, it returns 500 error page which is not that kind/intuitive to user on what it means.
trino> select 'isblocked?';
Error running command: Error starting query at http://localhost:8080/v1/statement returned an invalid response: JsonResponse{statusCode=500, statusMessage=Server Error, headers={cache-control=[must-revalidate,no-cache,no-store], content-length=[372], content-type=[text/html;charset=iso-8859-1], date=[Tue, 06 Aug 2024 12:57:05 GMT]}, hasValue=false} [Error: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 Request failed.</title>
</head>
<body>
<h2>HTTP ERROR 500 Request failed.</h2>
<table>
<tr><th>URI:</th><td>http://localhost:8080/v1/statement</td></tr>
<tr><th>STATUS:</th><td>500</td></tr>
<tr><th>MESSAGE:</th><td>Request failed.</td></tr>
</table>
Description:
We are encountering an issue with the Trino Gateway setup when querying multiple clusters. Below are the details of our current configuration and the problem:
Configuration:
adhoc
trino-etl
looker
Issue: When the
adhoc
cluster becomes busy, jdbc connection queries for stats time out, and the Trino Gateway becomes unreachable. The same thing happens, if we deactivate the adhoc cluster, while redeployment or restarts of trino cluster.This results in the following error message:
Stack Trace:
Steps to Reproduce:
adhoc
cluster.adhoc
cluster is busy.Expected Behavior: The Trino Gateway should handle busy clusters gracefully without causing a 500 error.
Actual Behavior: The gateway becomes unreachable with a 500 error when the
adhoc
cluster is busy.Environment: