Closed arcivanov closed 3 years ago
I presume that the failure is here:
but because the failure obscures the true cause, it's pretty hard to figure out what's going on.
The issue is localized to the query-frontend. The same query going into the front-end returned with 500 succeeds when directly querying the querier.
$ kubectl -n loki-test port-forward loki-distributed-querier-0 3100
Forwarding from 127.0.0.1:3100 -> 3100
Forwarding from [::1]:3100 -> 3100
Handling connection for 3100
$ curl -vvvv -H"X-Scope-OrgID: dev" -H 'Sec-WebSocket-Version: 13' -H 'Sec-WebSocket-Extensions: permessage-deflate' -H 'Sec-WebSocket-Key: v4vMUSLqpDDrrvhrCqfE+Q==' -H 'Connection: keep-alive, Upgrade' -H 'Upgrade: websocket' -H 'X-Hello: world' 'http://localhost:3100/loki/api/v1/tail?query=%7Bapp%3D%22loki-distributed%22%7D%20%7C%3D%22websocket%22'
* Trying ::1:3100...
* Connected to localhost (::1) port 3100 (#0)
> GET /loki/api/v1/tail?query=%7Bapp%3D%22loki-distributed%22%7D%20%7C%3D%22websocket%22 HTTP/1.1
> Host: localhost:3100
> User-Agent: curl/7.71.1
> Accept: */*
> X-Scope-OrgID: dev
> Sec-WebSocket-Version: 13
> Sec-WebSocket-Extensions: permessage-deflate
> Sec-WebSocket-Key: v4vMUSLqpDDrrvhrCqfE+Q==
> Connection: keep-alive, Upgrade
> Upgrade: websocket
> X-Hello: world
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 101 Switching Protocols
< Upgrade: websocket
< Connection: Upgrade
< Sec-WebSocket-Accept: ofNzHVvj/ErTqrfW0S1pvqEyhpw=
<
�~��{"streams":[{"stream":{"job":"loki-test/loki-distributed","namespace":"loki-test","node_name":"ip-10-0-150-250.ec2.internal","pod":"loki-distributed-query-frontend-7466f56c8f-sjnhj","app":"loki-distributed","component":"query-frontend","container":"loki","filename"
$ kubectl -n loki-test port-forward loki-distributed-query-frontend-7466f56c8f-qblhv 3100
Forwarding from 127.0.0.1:3100 -> 3100
Forwarding from [::1]:3100 -> 3100
Handling connection for 3100
$ curl -vvvv -H"X-Scope-OrgID: dev" -H 'Sec-WebSocket-Version: 13' -H 'Sec-WebSocket-Extensions: permessage-deflate' -H 'Sec-WebSocket-Key: v4vMUSLqpDDrrvhrCqfE+Q==' -H 'Connection: keep-alive, Upgrade' -H 'Upgrade: websocket' -H 'X-Hello: world' 'http://localhost:3100/loki/api/v1/tail?query=%7Bapp%3D%22loki-distributed%22%7D%20%7C%3D%22websocket%22'
* Trying ::1:3100...
* Connected to localhost (::1) port 3100 (#0)
> GET /loki/api/v1/tail?query=%7Bapp%3D%22loki-distributed%22%7D%20%7C%3D%22websocket%22 HTTP/1.1
> Host: localhost:3100
> User-Agent: curl/7.71.1
> Accept: */*
> X-Scope-OrgID: dev
> Sec-WebSocket-Version: 13
> Sec-WebSocket-Extensions: permessage-deflate
> Sec-WebSocket-Key: v4vMUSLqpDDrrvhrCqfE+Q==
> Connection: keep-alive, Upgrade
> Upgrade: websocket
> X-Hello: world
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< Sec-Websocket-Version: 13
< Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
< Date: Fri, 19 Mar 2021 03:45:25 GMT
< Content-Length: 22
<
Internal Server Error
* Connection #0 to host localhost left intact
@arcivanov are there any updates? Did you manage to solve it? .. I can confirm your comment above that directly querying the querier
works but queruing the query-frontend
gives an internal server error.
BTW I am using the latest version of loki 2.2.1 and deploying the distributed mode using Helm (using chart v0.31.2)
@arcivanov After days of debugging I finally managed to find the problem. You basically have to explicitly configure the query frontend for tailing. The property I mention below is having a default value of "" which causes the 500 Error on the Query Frontend side.
## In the Loki Configs Query Frontend Part:
frontend:
..
tail_proxy_url: http://YOUR_CLUSTER_INTERNAL_LOKI_QUERIER_SVC:3100
Of course YOUR_CLUSTER_INTERNAL_LOKI_QUERIER_SVC
should be the service name of the querier.
Source: https://grafana.com/docs/loki/latest/configuration/#query_frontend_config
I will open a PR to the Helm Charts repo and will post the link here so that people using Helm don't have to suffer like I did 😄
Here's my PR to solve this on the Helm Chart level https://github.com/grafana/helm-charts/pull/456
@arcivanov are there any updates? Did you manage to solve it?
We downgraded to the previous working version.
Here's my PR to solve this on the Helm Chart level grafana/helm-charts#456
That's wonderful, thank you so much for this!!!
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Hi, we also are experiencing this issue on the following version: grafana version : 8.1.5 loki version : 2.3.0
@avinashs433 It should be a config issue you need to adjust ,, Check my comment https://github.com/grafana/loki/issues/3499#issuecomment-849487447
Running microservice deployment. Grafana runs queries perfectly, promtail pushes logs. However, live tailing does not work due to WebSocket upgrade failure. We have a standalone Grafana in the same cluster and WebSocket works there.
A failure occurs immediately when trying to get the live stream of the logs: