metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.22k stars 68 forks source link

Can't query traces with Grafana 10.1.5+ #360

Closed Xantorero closed 12 months ago

Xantorero commented 1 year ago

Hello! It's me again! :D

After last question here I have managed to make this stack below work with metrics and logs.

            [Grafana]
                |
              [Qryn]
                |
           [Clickhouse]
                |
      [OpenTelemetry Collector]
      |      |        |        |
      [Metric exporters, Promtail]

So happy with the success, i decided to also add traces for Spring application. I am generating traces using OpenTelemetry java agent, setting OTLP_ENDPOINT to OpenTelemetry Collector. The otlp receiver gets logs and using qryn exporter send them to Qryn. I have also checked that the logs are present in Clickhouse using a simple query:

SELECT * FRLOM cloki.tempo_traces ORDER BY timestamp_ns DESC LIMIT 1;

Result:

oid │ trace_id │ span_id │ parent_id │ name │ timestamp_ns │ duration_ns │service_name│ payload_type │ payload

0 │ �w,7!e�,��*< │ r���<� │ Y���F%G │ POST │ 1697813874142232910 │ 17363493 │ XYZ │ 2 │

Where payload looks like (I have cut some of the keys as it was a private info): { "traceId":"MwuKdxIsNyFlhSwUl6kqPA==", "spanId":"csGQEpkSPPo=", "parentSpanId": "Wbb4BtGJUc=", "name":"POST", "kind":3, "startTimeUnixNano":"1697813874142232910", "endTimeUnixNano":"1697813874159596403", "attributes": [ {"key":"thread.id","value":{"intValue":"349"}}, {"key":"net.protocol.name","value":{"stringValue":"http"}}, {"key":"http.status_code","value":{"intValue":"201"}}, {"key":"net.protocol.version","value":{"stringValue":"1.1"}}, {"key":"http.method","value":{"stringValue":"POST"}}, {"key":"otel.scope.version","value":{"stringValue":"1.31.0-alpha"}}, {"key":"otel.scope.name","value":{"stringValue":"io.opentelemetry.apache-httpclient-4.0"}}, {"key":"service.name","value":{"stringValue":"XYZ"}}, {"key":"opencensus.exporterversion","value":{"stringValue":"Jaeger-opentelemetry-java"}}, {"key":"telemetry.auto.version","value":{"stringValue":"1.31.0"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}}, {"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.31.0"}}],"status":{}} ] }

So far so good. Now is the problematic part. I went to Grafana, added Tempo dataource. But in exlorer view when I try to query traces, no traces are shown. Also the query builder doesnt show any tips and values. I went to Qryn logs:

{"level":30,"time":1697812110614,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-45","req":{"method":"GET","url":"/api/v2/search/tag/resource.service.name/values","hostname":"CENSORED","remoteAddress":"CENSORED","remotePort":39936},"msg":"incoming request"} {"level":20,"time":1697812110616,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-45","msg":"unsupported"} {"level":30,"time":1697812110617,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-45","res":{"statusCode":200},"responseTime":2.2390289306640625,"msg":"request completed"} {"level":30,"time":1697812110786,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-46","req":{"method":"GET","url":"/api/v2/search/tag/name/values","hostname":"CENSORED","remoteAddress":"CENSORED","remotePort":39936},"msg":"incoming request"} {"level":20,"time":1697812110787,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-46","msg":"unsupported"} {"level":30,"time":1697812110802,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-47","req":{"method":"GET","url":"/api/v2/search/tag/span.undefined/values","hostname":"CENSORED","remoteAddress":"CENSORED","remotePort":39936},"msg":"incoming request"} {"level":30,"time":1697812110803,"pid":19,"hostname":"XYZ","name":"qryn","reqId":"req-47","res":{"statusCode":200},"responseTime":0.5580825805664062,"msg":"request completed"}

There are few UNSUPPORTED messages, but to be honest I have no idea why this error happens, so here I am again, asking for help!

akvlad commented 1 year ago

Tested the qryn collector. All the metrics, traces and logs get ingested successfully. The environment: latest version of qryn-collector: https://github.com/metrico/otel-collector/pkgs/container/qryn-otel-collector The configuration:

receivers:
  zipkin:
  loki:
    protocols:
      http:
  hostmetrics:
    collection_interval: 15s
    initial_delay: 15s
    root_path:
    scrapers:
      cpu:
      disk:
      load:
processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  memory_limiter:
    check_interval: 2s
    limit_mib: 1800
    spike_limit_mib: 500

exporters:
  qryn:
    dsn: tcp://localhost:9000/qryn2?username=default
    clustered_clickhouse: true
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
      format: json

extensions:
  health_check:
  pprof:
  zpages:
  memory_ballast:
    size_mib: 1000

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    logs:
      receivers: [loki]
      processors: [memory_limiter, batch]
      exporters: [qryn]
    metrics:
      receivers: [hostmetrics]
      processors: [memory_limiter, batch]
      exporters: [qryn]
    traces:
      receivers: [zipkin]
      processors: [memory_limiter, batch]
      exporters: [qryn]

please note exporters.qryn.clustered_clickhouse should be true if you have more than one clickhouse node behind a load balancer.

Xantorero commented 1 year ago

Sorry, my long post was a little bit chaotic with information. My problem is not with the data ingestion with Oryn-otel-collector into Clickhouse, all the data metrics, logs and traces are properly inserted. I have problem on the line:

Grafana [Tempo datasource] -> Qryn

I did some research and the issue is connected to Grafana version. Under version 10, so let's say the latest 9.5.13 everything works fine. Bit since Grafana version 10+ I can't query traces. The reason is that underlaying API calls from Grafana to Tempo datasource did change, I can see that in Qryn logs:

Grafana 9.5.13 calls to Qryn look like this: "GET /api/search/tag/values"

But Grafana 10+ calls to Qryn like this: "GET /api/v2/search/tag/name/values"

So basically if I use new Grafana releases I can't access the traces stored qith Qryn in Clickhouse.

akvlad commented 1 year ago

Currently the latest tested working version is Grafana v10.0.1 (5a30620b85) . It works ok.

Xantorero commented 1 year ago

You are right, 10.0.1 works fine - by 10+ I meant 10.1.5 and 10.2.0 that I have tested and they did not work. Do You have some plans regarding Grafana updates?

akvlad commented 1 year ago

@Xantorero I have just inserted your issue into the next release issue.

Xantorero commented 1 year ago

Thank You very much!

lmangani commented 12 months ago

Resolved in 3.x feel free to reopen if needed.