grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.03k stars 522 forks source link

[DOC] Add doc for filter_server_spans and update TraceQL metrics config doc #4139

Open knylander-grafana opened 1 month ago

knylander-grafana commented 1 month ago

Ats Uiboupin reported an issue getting TraceQL metrics to work correctly. We should up the docs to clarify TraceQL metrics configuration.

Ats installed using tempo-distributed Helm chart.

To fix:

knylander-grafana commented 1 month ago

Original context from the community Tempo Slack channel:

TraceQL metrics queries feature (introduced in 2.4) working even with the simplest possible query like {} | rate() as mentioned by @Joe Elliott in the video at 0:22. For me it results in

Query error
Error (error finding generators in Querier.queryRangeRecent: empty ring ). Please check the server logs for more details.

There was no reference to additional configuration needed for this feature neither in 2.4 release notes nor upgrade instructions, but i found Configure TraceQL metrics page, and i activated loacl-blocks processor based on this documentation and configured the processor exactly like documented here, but I still see the same error (as seen in the screenshot).

First part of the error (error finding generators in Querier.queryRangeRecent:) comes from these lines, and empty ring part of the error from these lines.

Server logs don't contain any additional information that could help me in this case (found 2 lines from querier and 2 lines from query-frontend).

knylander-grafana commented 1 month ago

Update the local-blocks processor config

Update this page - https://grafana.com/docs/tempo/latest/operations/traceql-metrics/#activate-and-configure-the-local-blocks-processor

That doc uses deprecated override configuration format (that would fail Tempo startup if rest of the conf is using new format), but i changed it according to 2.6 release notes (Operational change for TraceQL metrics).

You can check the endpoint in any of your tempo services - /status/config and then check the overrides section to see if the config has been applied. Should see something like this:

overrides:
    defaults:
        ingestion:
            rate_strategy: local
            rate_limit_bytes: 20000000
            burst_size_bytes: 20000000
            max_traces_per_user: 10000
        read:
            max_bytes_per_tag_values_query: 5000000
        metrics_generator:
            processors:
                - service-graphs
                - span-metrics
                - local-blocks
            generate_native_histograms: both
            ingestion_time_range_slack: 0s

can also check /metrics-generator/ring

(Marty) Tempo metrics-generator server-side can have most of the functionality turned off. All you need for traceql metrics queriers {} | rate()is the local-blocks processor. So it won't duplicate work to generate mimir series. Or you can move it all to server-side as another possibility.

knylander-grafana commented 1 month ago

Also from Ats:

thanks for that link. I had seen it and I had previously applied everything except "For historical data" conf part, that i missed somehow, as i followed quite similar instructions on Configure TraceQL metrics page, that has additional conf block:

  storage:
    path: /var/tempo/generator/wal
  traces_storage:
    path: /var/tempo/generator/traces

but that page didn't mention

flush_to_storage: true

from the page you linked to. I added it now as well, but i have to say that it seems to me that flush_to_storage isn't documented - i'd expect it to be documented in configuration/#metrics-generator. I guess that was accidentally left undocumented?

atsu85 commented 1 month ago

thanks @knylander-grafana for converting this into an issue!

Perhaps you can update this comment to avoid confusion, based on following points:

1) Uups, i meant flush_to_storage, not filter_server_spans as I initially wrote:

it seems to me that filter_server_spans isn't documented - i'd expect it to be documented in configuration/#metrics-generator

2) this:

but that page didn't mention

should be between the two code blocks, but it was combined into one when pasting from Slack to GH issue

gpcmol commented 1 month ago

This is how it works for us:

tempo-distributed:
  gateway:
    enabled: true
  global:
    image:
      registry: docker.example.com
  minio:
    enabled: true
  traces:
    otlp:
      http:
        enabled: true
      grpc:
        enabled: true
  distributor:
    config:
      log_received_spans:
        enabled: true
  global_overrides:
    metrics_generator_processors:
      - service-graphs
      - span-metrics
      - local-blocks
  metricsGenerator:
    enabled: true
    config:
      processor:
        local_blocks:
          flush_to_storage: true
knylander-grafana commented 1 month ago

Perhaps you can update this comment to avoid confusion, based on following points:

Updated the comment. Does that look better?