lyft / presto-gateway

A load balancer / proxy / gateway for prestodb
Apache License 2.0
358 stars 156 forks source link

Queries are routed to unhealthy backends #199

Open ahackett opened 1 year ago

ahackett commented 1 year ago

Hi

Thank you for making the presto-gateway available to the community, it's a really useful piece of software.

I've recently been evaluating and testing the gateway, and have found that queries are being routed to backend clusters even when the coordinator is stopped, which of course results in query failures.

My config includes:

    modules:
      - com.lyft.data.gateway.ha.module.HaGatewayProviderModule
      - com.lyft.data.gateway.ha.module.ClusterStateListenerModule

    managedApps:
      - com.lyft.data.gateway.ha.GatewayManagedApp
      - com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor

I can see that https://github.com/lyft/presto-gateway/pull/183 "filters out unhealthy clusters from queue based routing logic".

However, since our clusters have authentication enabled I get the following errors in the logs:

WARN  [2023-05-04 08:55:42,052] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received non 200 response, response code: 401
ERROR [2023-05-04 08:55:42,053] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received null/empty response for http://trino.example.net:8080/ui/api/stats

So the ActiveClusterMonitor cannnot fetch the metrics which I assume are needed for queue based routing to work.

I was hoping that someone might be able to confirm that the behaviour I'm seeing (queries routed to unhealthy backends) is definitely caused by the failure to fetch the metrics. If confirmed, I could look at adding support for password authentication with the /ui/api/stats endpoint.

Thanks

Austin

connorlwilkes commented 1 year ago

This seems like something that should be supported, would be good for someone to validate this behavior if possible

willmostly commented 1 year ago

I've added an option to my fork to query system.runtime.nodes and system.runtime.queries via JDBC with JWT authn instead of using the /ui/api/stats endpoint. In my case we require OAuth for the WebUI, so the gateway can't use any /ui/api endpoints.

I'm happy to contribute back if it is of general interest.

Felicity-3786 commented 1 year ago

Trying to leverage ActiveClusterMonitor and met the same problem, I was thinking if only for health check purpose v1/status or v1/info/state should be enough, but if we want to get /ui/api/stats to work requires adding a certificate? Happy to discuss more.

ahackett commented 1 year ago

Hi @willmostly

Thanks for the info, and for offering to make you contributions available. This would definitely be of interest to me, and I suspect others in the community are using JWT auth too...

Austin

ccchenhe commented 1 year ago

I think seems after 389 version, api "ui/api/stats" of health check became an internal api (meaning login is required). so maybe necessary to modify code of http request in order to get login token. here is just login demo:

public static String getTrinoClusterToken(String targetUrl, MonitorConfiguration monitorConfiguration){
        HttpURLConnection loginConnection = null;
        try{
            URL loginUrl = new URL(String.format("%s/ui/login", targetUrl));
            loginConnection = (HttpURLConnection) loginUrl.openConnection();
            loginConnection.setRequestMethod("POST");
            loginConnection.setDoOutput(true);
            loginConnection.setInstanceFollowRedirects(false);
            loginConnection.addRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            loginConnection.setRequestProperty("charset", "utf-8");
            String password = "";
            if(!Strings.isNullOrEmpty(monitorConfiguration.getMonitorPassword())){
                password = monitorConfiguration.getMonitorPassword();
            }
            String loginData = String.format("username=%s&password=%s", monitorConfiguration.getMonitorUser(), password);
            byte[] postData = loginData.getBytes( StandardCharsets.UTF_8);
            loginConnection.connect();
            DataOutputStream out = new DataOutputStream(loginConnection.getOutputStream());
            out.write(postData);
            String token = loginConnection.getHeaderField("Set-Cookie");
            out.close();
            // data simple: Trino-UI-Token=xx;Version=1;Path=/ui;HttpOnly
            // so we need split by ;
            return token.split(";")[0];

        }catch (Exception e){
            log.error("Error login trino cluster from {},reason is:  {}", targetUrl, e);
        }finally {
            if (loginConnection != null) {
                loginConnection.disconnect();
            }
        }
        return "";
    }

after obtain login token, change healthy check http request code:

public static String requestTrinoClusterStatsWithToken(String targetUrl, String token){
        HttpURLConnection apiConnection = null;
        try{
            URL apiURL = new URL(String.format("%s/ui/api/stats", targetUrl));
            apiConnection = (HttpURLConnection) apiURL.openConnection();
            apiConnection.setRequestMethod("GET");
            apiConnection.setRequestProperty("Cookie", token);
            int responseCode = apiConnection.getResponseCode();
            if (responseCode == HttpStatus.SC_OK) {
                BufferedReader reader = new BufferedReader(new InputStreamReader((InputStream) apiConnection.getContent()));
                StringBuilder sb = new StringBuilder();
                String line;
                while ((line = reader.readLine()) != null) {
                    sb.append(line).append("\n");
                }
                return sb.toString();
            }
        }catch (Exception e){
            log.error("Error fetching cluster stats from [{}]", targetUrl, e);
        }finally {
            if (apiConnection != null){
                apiConnection.disconnect();
            }
        }
        return null;
    }

and healthy check logic works.

ccchenhe commented 1 year ago

but if you just want to confirm that cluster is healthy. maybe only change url from "/ui/api/stats" to "/v1/status"

Chaho12 commented 1 year ago

Just a heads up @ccchenhe @ahackett, this issue has been fixed and uses /ui/api/stats for jdbc/http in trinodb/trino-gateway https://github.com/trinodb/trino-gateway/commit/11b2dd2fa490d4a626620a1d92266a485b86530e#diff-a3a976d789379cd160fbc2e9281073d96aa6e682e9d9e9ab67883992acaeeab2

For http and jdbc it uses username/pwd set in backendStateConfiguration

Bigbarry7549 commented 10 months ago

uhuh

Bigbarry7549 commented 10 months ago

boobs