grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.02k stars 521 forks source link

[tempo-distributed, grafana] How to enable traceQLStreaming getting error "rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type "text/plain; charset=utf-8"" #3987

Open vaibhhavv opened 2 months ago

vaibhhavv commented 2 months ago

Hi, I am using grafana helm chart v7.3.11 and tempo-distributed helm chart v1.16.0. I enabled traceQLStreaming through grafana.ini feature toggle options and after that when I query on Tempo data source, it does not give any result and throws the below error. Do I need to do any configuration on Tempo as well to enable traceQLStreaming?

image

I am not sure what I am missing here to enable traceQLStreaming. Help would be much appreciated.

javiermolinar commented 2 months ago

Hi, for streaming over http you need to enable this: https://grafana.com/docs/tempo/latest/api_docs/#tempo-grpc-api

vaibhhavv commented 2 months ago

Hi @javiermolinar, I tried enabling it but still the error persists. It looks like below under tempo.yaml

...
stream_over_http_enabled: true
usage_report:
  reporting_enabled: true
...

Just wanted to mention, we utilise tempo-distributed with multi-tenancy enabled.

javiermolinar commented 2 months ago

What is the version of the Tempo and Grafana you are running?

vaibhhavv commented 2 months ago

@javiermolinar Grafana Helm chart v7.3.11, Grafana v10.4.1 tempo-distributed helm chart v1.16.0, Tempo v2.5.0

javiermolinar commented 2 months ago

With the stream_over_http disabled on Tempo side, I get a different error in Grafana: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: http2: frame too large"

So the problem must be somewhere else. Could you check the nginx gateway config/logs to see if is blocking the requests?

vaibhhavv commented 2 months ago

Hi, @javiermolinar I enabled the tempo gateway, before that it was disabled. Enabling it does not make that error go away.

Below are the gateway logs at the time when I received above mentioned error on grafana:

2024-08-28T12:26:52+05:30 10.67.196.153 - - [28/Aug/2024:06:56:52 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:02+05:30 10.67.196.153 - - [28/Aug/2024:06:57:02 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:12+05:30 10.67.196.153 - - [28/Aug/2024:06:57:12 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:22+05:30 10.67.196.153 - - [28/Aug/2024:06:57:22 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:32+05:30 10.67.196.153 - - [28/Aug/2024:06:57:32 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:42+05:30 10.67.196.153 - - [28/Aug/2024:06:57:42 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"
2024-08-28T12:27:52+05:30 10.67.196.153 - - [28/Aug/2024:06:57:52 +0000]  200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.29" "-"

Below are the logs of grafana at the same time:

2024-08-28T12:27:27+05:30 logger=live t=2024-08-28T06:57:27.661127609Z level=info msg="Initialized channel handler" channel=ds/aaaaa/search/xxxxx address=ds/aaaaa/search/xxxxx
2024-08-28T12:27:27+05:30 logger=tsdb.tempo t=2024-08-28T06:57:27.669867555Z level=error msg="Error receiving message" err="rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type \"text/plain; charset=utf-8\""
2024-08-28T12:27:28+05:30 logger=live t=2024-08-28T06:57:28.951964577Z level=info msg="Initialized channel handler" channel=ds/aaaaa/search/yyyyy address=ds/aaaaa/search/yyyyy
2024-08-28T12:27:28+05:30 logger=tsdb.tempo t=2024-08-28T06:57:28.957688418Z level=error msg="Error receiving message" err="rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type \"text/plain; charset=utf-8\""
2024-08-28T12:27:37+05:30 logger=infra.usagestats t=2024-08-28T06:57:37.028840347Z level=info msg="Usage stats are ready to report"

Not sure what is going wrong?

  1. Are you sure that stream_over_http_enabled is only required to enable traceQLStreaming on tempo in this respective version?
  2. If we have multi-tenancy enabled on tempo, does it required some extra config enabled on grafana/tempo?
  3. I only saw document related to the stream_over_http_enabled parameter when you shared this but things about this configuration should also be here which I didn't found.
vaibhhavv commented 2 months ago

Hi @joe-elliott @javiermolinar requesting your help here 🙏

joe-elliott commented 2 months ago

Are you sure that stream_over_http_enabled is only required to enable traceQLStreaming on tempo in this respective version?

In Tempo. Yes. In the helm chart. No idea. Perhaps try to connect Grafana directly to the query frontend. I believe the helm chart "gateway" is nginx. It's quite possible it is not configured correctly for streaming.

knylander-grafana commented 2 months ago

If you find a resolution, please let us know. We can update the Helm chart documentation for tempo-distributed.

vaibhhavv commented 2 months ago

@knylander-grafana it's still an open issue on my end. Sure, if resolved I will let you know the solution/s.

vaibhhavv commented 1 week ago

Hi @knylander-grafana, @javiermolinar, @joe-elliott, I now use grafana v11.3.0 and tempo v2.5.0 and still the issue persistes. One new info: I do use Istio in my environments and istio sidecars are attached to the tempo-distributed pods.

Grafana traceQLStreaming feature_toggles is enabled, also stream_over_http_enabled: true is set for tempo.

When I enabled traceqlStreaming in tempo data source from grafana UI, I get below errors. Image

Upon more troubleshooting, I found something in logs.

Gateway Logs:

2024/11/01 05:32:02 [error] 9#9: *84845 open() "/etc/nginx/html/tempopb.StreamingQuerier/Search" failed (2: No such file or directory), client: 127.0.0.6, server: , request: "POST /tempopb.StreamingQuerier/Search HTTP/1.1", host: "my-tempo-gateway.my-namespace:80"
127.0.0.6 - - [01/Nov/2024:05:32:02 +0000]  404 "POST /tempopb.StreamingQuerier/Search HTTP/1.1" 154 "-" "grpc-go/1.66.0" "-"

Query-Frontend Logs, Querier Logs, Ingester Logs all contain below logs:

warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_stream.h:176    DeltaAggregatedResources gRPC config stream to xds-grpc closed: 2, send error for type url type.googleapis.com/envoy.config.route.v3.RouteConfiguration: EOF    thread=12

QUESTION:

  1. I wanted to know the compatibility of tempo-distributed/tempo with Istio?
  2. Also I see gateway/tempo components uses HTTP/1.1. Is that compatible to work with gRPC streaming? (Look here)

As an experienced person in this field, what is your opinion? Could you please share.

joe-elliott commented 1 week ago
  1. I have not operated istio so I can't say for sure. You will need to make sure that envoy is proxying HTTP2 generically or perhaps it supports GRPC more specifically?

  2. Yes it seems there is missing config in the nginx gw in the helm chart to allow for TraceQL streaming. I'm was looking at the config here and I don't see a pass for the streaming GRPC endpoint. It may also need special config for proxying HTTP2.