grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.05k stars 524 forks source link

[DOC] dimension_mappings does not work as expected for metrics-generator #3376

Open suddyv519 opened 9 months ago

suddyv519 commented 9 months ago

Describe the bug I am trying to use dimension_mappings to relabel dimensions that I have added in the Tempo metrics-generator config.

I have tried this config:

      span_metrics:
        dimension_mappings:
          - name: job
            source_labels:
              - service
          - name: env
            source_labels:
              - deployment_environment
          - name: partition
            source_labels:
              - deployment_partition
          - name: datacenter
            source_labels:
              - deployment_datacenter
          - name: site
            source_labels:
              - deployment_site
        dimensions:
          - deployment.environment
          - deployment.partition
          - deployment.datacenter
          - deployment.site

Which included deployment_{environment,partition,datacenter,site} labels but they were not renamed to env, partition, etc as expected.

I also tried removing the dimensions block so it would only include the dimension_mappings thinking it would overwrite the mapping but that simply removed the additional dimensions entirely.

I believe there is a bug with the mapping logic causing this issue.

To Reproduce Steps to reproduce the behavior:

  1. Start Tempo with metrics-generator
  2. Add additional dimensions and dimension_mappings
  3. Observe metrics via Mimir in Grafana and see what labels are coming through.

Expected behavior I expected my span attributes to be present as additional dimensions in the metrics with the new label I specifified. Instead I did get the additional dimensions but they were not labeled as I wanted, instead it was just the span attribute with the . replaced with a _.

E.g. deployment.datacenter becomes deployment_datacenter instead of simply datacenter as I had mapped.

Environment:

Additional Context

suddyv519 commented 9 months ago

For anyone that finds this, I found a workaround by using the write_relabel_configs under remote_write

metricsGenerator:
  enabled: true
  config:
    processor:
      service_graphs:
      span_metrics:
        dimensions:
          - deployment.environment
          - deployment.partition
          - deployment.datacenter
          - deployment.site
    storage:
      path: <<path>>
      wal:
      remote_write_flush_deadline: 1m
      remote_write:
        - url: <<URL>>
          write_relabel_configs:
          - source_labels: [ deployment_environment ]
            regex: (.*)
            target_label: env
          - source_labels: [ deployment_partition ]
            regex: (.*)
            target_label: partition
          - source_labels: [ deployment_datacenter ]
            regex: (.*)
            target_label: datacenter
          - source_labels: [ deployment_site ]
            regex: (.*)
            target_label: site
          - source_labels: [ service ]
            regex: (.*)
            target_label: job
          - regex: "deployment_environment|deployment_partition|deployment_datacenter|deployment_site|service"
            action: labeldrop
snuggie12 commented 8 months ago

I still have this issue and a lot of things are unclear how this feature is supposed to work:

joe-elliott commented 8 months ago

Based on my reading of the code I believe you need to provide the raw name of the span attribute and not the "prometheus" one. So:

      span_metrics:
        dimension_mappings:
          - name: env
            source_labels:
              - deployment.environment

instead of deployment_environment.

In slack it's suggested join is not necessary but it doesn't say anything in the docs. Nor is there a default in the struct. I suppose that means it's a string and default value is "" but not sure everyone should be expected to go poking around in the code.

Agree "join" is not necessary.

Can you have two mappings with the same name? For instance, I have both grpc.authority and http.host which I'd like to merge as dst.host (or dst_host as I'm unsure which one to use.) This is so I can use a single panel and a single variable on a dashboard.

Yes, I believe this would work:

      span_metrics:
        dimension_mappings:
          - name: dst.host
            source_labels:
              - grpc.authority
              - http.host

The code is a little tough to read and there certainly may be a bug here, but nothing stands out atm.

edwintye commented 7 months ago

We have this implemented using the otel naming convention as Joe suggested. I think it would be nice to have this in the docs with examples on dimensions that is commonly used. For example, we add the cluster and environment labels to the span metrics by extracting it from the dimensions setting then rename it via dimension_mappings. This gives us something more uniform compare to all the other existing prometheus metric/labels.

overrides:
  defaults:
    metrics_generator:
      processor:
        span_metrics:     
          dimensions: # adds dimension for the mutation
            - k8s.cluster.name
            - deployment.environment 
          dimension_mappings:
            - name: cluster # creates a duplicate label cluster to be in line with prometheus format
              source_labels:
                - k8s.cluster.name
            - name: environment # creates a duplicate label environment to be in line with prometheus format
              source_labels:
                - deployment.environment

Also worth noting that it is particularly hard to locate the correct info (at the time of writing) since the links are broken both ways due to missing brackets () after the square brackets in this page https://grafana.com/docs/tempo/latest/operations/user-configurable-overrides/#supported-fields from

https://github.com/grafana/tempo/blob/9caf167fb77228993b252a274b75a5cd2b648fe9/docs/sources/tempo/operations/user-configurable-overrides.md?plain=1#L42

and https://grafana.com/docs/tempo/latest/configuration/#user-configurable-overrides from

https://github.com/grafana/tempo/blob/9caf167fb77228993b252a274b75a5cd2b648fe9/docs/sources/tempo/configuration/_index.md?plain=1#L1509

msvechla commented 7 months ago

I have issues with using the dimension_mappings as well. This is my config (copied from our Helm chart values):

  metricsGenerator:
    config:
      processor:
        span_metrics:
          dimensions:
            # the following dimensions should be relabeled below
            - "http.method"
            - "http.status_code"
          dimension_mappings:
            - name: http_request_method
              source_labels:
                - "http.method"
            - name: http_response_status_code
              source_labels:
                - "http.status_code"

This is not working for me. Metrics still appear with the labels http_method and http_status_code

fede843 commented 5 months ago

This is weird. If I define it under metrics_generator.processor.span_metrics.dimension_mappings it does not work. But if I use overrides.defaults.processor.span_metrics.dimension_mappings it does.

I have also found some missing bits in documentation, for instance all the filter_policies stuff in only mentioned in the override section and some other minor discrepancies between the override span_metrics section vs the metrics_generator block.

fede843 commented 5 months ago

related to this probably, I realised we were missing traces_target_info. Tried to put the setting in the override section and worked.

fede843 commented 5 months ago

found a related issue https://github.com/grafana/tempo/issues/2834

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.