trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.86k stars 2.85k forks source link

Unable to retrieve cluster stats via API non-interactively #3661

Open tooptoop4 opened 4 years ago

tooptoop4 commented 4 years ago

Previously could do a curl to v1/cluster and get a response like:

{"runningQueries":0,"blockedQueries":0,"queuedQueries":0,"activeCoordinators":1,"activeWorkers":1,"runningDrivers":0,"totalAvailableProcessors":2,"reservedMemory":0.0,"totalInputRows":9563,"totalInputBytes":6249754,"totalCpuTimeSecs":0}

Now the endpoint has changed to ui/api/stats, it works fine for interactive user in the browser (after going via form login page), but non-interactive commands by generic user (with valid credentials) are blocked :(

curl -s -k -L -u "user:pass" https://domain:4039/ui/api/stats -vvv

> GET /ui/api/stats HTTP/1.1
> Host: domain:4039
> Authorization: Basic redact
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Date: Thu, 07 May 2020 10:34:55 GMT
< WWW-Authenticate: Presto-Form-Login
< Content-Length: 0

Can below code be removed? https://github.com/prestosql/presto/blob/master/presto-main/src/main/java/io/prestosql/server/ui/FormWebUiAuthenticationManager.java#L141-L146

        // send 401 to REST api calls and redirect to others
        if (request.getPathInfo().startsWith("/ui/api/")) {
            response.setHeader(WWW_AUTHENTICATE, "Presto-Form-Login");
            response.setStatus(SC_UNAUTHORIZED);
            return;
        }

Update: even commenting out that block did not solve it :(

tooptoop4 commented 4 years ago

cc @dain

tchunwei commented 4 years ago

This config work for me, version 334

web-ui.authentication.type=fixed
web-ui.user=user
tooptoop4 commented 4 years ago

@tchunwei I want multiple LDAP users to access the UI not just single user

tooptoop4 commented 4 years ago

in v1/jmx/mbean i can see QueuedQueries and RunningQueries but nothing about BlockedQueries

dain commented 3 years ago

Cluster stats were added only for the UI, and are considered a private detail of the web UI. As part of the recent web UI changes all of the internal details of the UI were moved under the /ui prefix. Anything in there should not be used by tooling as they will change at any time without concern for backwards compatibility.

I would suggest that a new interface be designed for what is needed instead of starting with the stuff we added to make the UI look good :)

vijay-balakrishnan commented 3 years ago

We had some automation to bring down the cluster , based on the information available in /v1/cluster. Seems/v1/cluster end point is moved to /ui/api/stat and we are unable to access the /ui/api/stat from any automation jobs due to 401.

tooptoop4 commented 3 years ago

right, would be nice to have a 'breaking changes' section in the release notes!

grantatspothero commented 3 years ago

+1 on a breaking changes section of the release notes, this is a problem for anyone using the open source Presto helm chart which uses the /v1/cluster/ endpoint as a healthcheck: https://github.com/helm/charts/blob/master/stable/presto/templates/deployment-coordinator.yaml#L44-L51

Do any presto devs have a suggested endpoint to use as a healtcheck instead? @dain Preferably something that would not require auth.

edit: asked in slack, and /v1/status works for my use case of a k8s health check

findepi commented 3 years ago

edit: asked in slack, and /v1/status works for my use case of a k8s health check

Exactly. And since https://github.com/prestosql/presto/pull/3428 this can be consumed with HEAD requests too, specifically for health check purposes.

johnwhumphreys commented 2 years ago

The older version of prestosql (pre-trino) we use in some clusters still has crappy stats for queued queries/etc in prometheus. But the web UI works. I wrote this in bash which can mine out the metrics for anyone else who needs to do the same. Good enough to dump some info to a text file for a prometheus exporter or whatever.

% COOKIE_VALUE=$(curl --location --request POST 'https://some.cluster.com/ui/login' \
--data-urlencode 'username=john.humphreys' \
--data-urlencode 'password=<password>' --cookie-jar - --output /dev/null  --silent | awk '{print $7}' | tail -1l)

curl 'https://some.cluster.com/ui/api/stats' -H $''"Cookie: Presto-UI-Token=$COOKIE_VALUE"'' | jq --color-output
{
  "runningQueries": 8,
  "blockedQueries": 0,
  "queuedQueries": 0,
  "activeCoordinators": 1,
  "activeWorkers": 35,
  "runningDrivers": 3957,
  "totalAvailableProcessors": 2450,
  "reservedMemory": 2770000473,
  "totalInputRows": 1133212564136,
  "totalInputBytes": 10872687401451,
  "totalCpuTimeSecs": 777021
}
njalan commented 1 year ago

@johnwhumphreys Thanks for your reply and I fixed it the same way.

hanson2021 commented 11 months ago

@johnwhumphreys Thanks for your good idea. New version Trino need to replace Presto-UI-Token with Trino-UI-Token

lambrospetrou commented 10 months ago

I am also using the snippet from @johnwhumphreys but Trino should definitely expose a "stable" endpoint to get query state instead of querying the UI endpoints. We need this to properly implement graceful shutdown for coordinators using the https://github.com/lyft/presto-gateway.

lozbrown commented 9 months ago

Just in case its useful to someone trino now has metrics exporting build it, at time of writing some of the metrics appear to be missing but @mattstep assures us that a fix for this is coming soon

Details in comments here: https://github.com/trinodb/trino/issues/1581