grafana / helm-charts

Apache License 2.0
1.66k stars 2.28k forks source link

[tempo-distributed] Metrics Generator remote_write is not in the generated configMap #2369

Closed dreaminghk closed 1 year ago

dreaminghk commented 1 year ago

Hi guys, I have a problem with metric generator here.

Situation:

The metric generator is enabled and configured to send metrics to prometheus. But no metric is sent to prometheus, the metric-generator log is not indicating any metric was being sent. From the generated temo-config configMap, we can see the configMap is not taking the metricsGenerator.storage.remote_write values from values.yaml.

Version of helm chart:

NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
tempo   tempo           4               2023-04-26 08:12:04.554494469 +0000 UTC deployed        tempo-distributed-1.2.10        2.0.1

Here is the values.yaml

---
serviceAccount:
  name: tempo
  create: true
  annotations:
    eks.amazonaws.com/role-arn: ...
storage:
  trace:
    backend: s3
    s3:
      ...
metricsGenerator:
  enabled: true
  config:
    storage:
      path: /var/tempo/wal
#trial 1
#      remote_write:
#        - url: "http://my-service.sre.svc.cluster.local:9090/api/v1/write"
#trial 2
    storage_remote_write:
     - url: http://cortex/api/v1/push
       send_exemplars: true
minio:
  enabled: false
compactor:
  config:
    compaction:
      block_retention: "336h"  #Trace data to be retent for 14 day in the backend. A Life cycle policy should be created to remove objects are 1-2 days older than this value. Otherwise some orphan object can be left in S3 due to crash of injesters or compactors.
ingester:
  persistence:
    # -- Enable creating PVCs which is required when using boltdb-shipper
    enabled: true
    # -- use emptyDir with ramdisk instead of PVC. **Please note that all data in ingester will be lost on pod restart**
    inMemory: false
    # -- Size of persistent or memory disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: ebs-sc
    # -- Annotations for ingester's persist volume claim
    annotations:
      app: sreopfintlaba-tempo-ingester
traces:
  otlp:
    grpc:
      enabled: true
    http:
      enabled: true
  zipkin:
    enabled: false
  jaeger:
    thriftHttp:
      enabled: false
  opencensus:
    enabled: false
# Global overrides
global_overrides:
  metrics_generator_processors:
    - service-graphs
    - span-metrics

This is the generated tempo-config configMap. The remote_write is empty here.

{
    "overrides.yaml": "
        overrides: {}
        ",
    "tempo-query.yaml": "backend: 127.0.0.1:3100
        ",
    "tempo.yaml": "
        compactor:
          compaction:
            block_retention: 336h
            compacted_block_retention: 1h
            compaction_cycle: 30s
            compaction_window: 1h
            max_block_bytes: 107374182400
            max_compaction_objects: 6000000
            max_time_per_tenant: 5m
            retention_concurrency: 10
            v2_in_buffer_bytes: 5242880
            v2_out_buffer_bytes: 20971520
            v2_prefetch_traces_count: 1000
          ring:
            kvstore:
              store: memberlist
        distributor:
          receivers:
            otlp:
              protocols:
                grpc:
                  endpoint: 0.0.0.0:4317
                http:
                  endpoint: 0.0.0.0:4318
          ring:
            kvstore:
              store: memberlist
        ingester:
          lifecycler:
            ring:
              kvstore:
                store: memberlist
              replication_factor: 3
            tokens_file_path: /var/tempo/tokens.json
        memberlist:
          abort_if_cluster_join_fails: false
          join_members:
          - tempo-gossip-ring
        metrics_generator:
          processor:
            service_graphs:
              dimensions: []
              histogram_buckets:
              - 0.1
              - 0.2
              - 0.4
              - 0.8
              - 1.6
              - 3.2
              - 6.4
              - 12.8
              max_items: 10000
              wait: 10s
              workers: 10
            span_metrics:
              dimensions: []
              histogram_buckets:
              - 0.002
              - 0.004
              - 0.008
              - 0.016
              - 0.032
              - 0.064
              - 0.128
              - 0.256
              - 0.512
              - 1.02
              - 2.05
              - 4.1
          registry:
            collection_interval: 15s
            external_labels: {}
            stale_duration: 15m
          ring:
            kvstore:
              store: memberlist
          storage:
            path: /var/tempo/wal
            remote_write: []                 <--------------------expected this line to take the values from values.yaml
            remote_write_flush_deadline: 1m
            wal: null
        multitenancy_enabled: false
        overrides:
          metrics_generator_processors:
          - service-graphs
          - span-metrics
          per_tenant_override_config: /conf/overrides.yaml
        querier:
          frontend_worker:
            frontend_address: tempo-query-frontend-discovery:9095
          max_concurrent_queries: 20
          search:
            external_endpoints: []
            external_hedge_requests_at: 8s
            external_hedge_requests_up_to: 2
            prefer_self: 10
            query_timeout: 30s
          trace_by_id:
            query_timeout: 10s
        query_frontend:
          max_retries: 2
          search:
            concurrent_jobs: 1000
            target_bytes_per_job: 104857600
          tolerate_failed_blocks: 0
          trace_by_id:
            hedge_requests_at: 2s
            hedge_requests_up_to: 2
            query_shards: 50
        server:
          grpc_server_max_recv_msg_size: 4194304
          grpc_server_max_send_msg_size: 4194304
          http_listen_port: 3100
          http_server_read_timeout: 30s
          http_server_write_timeout: 30s
          log_format: logfmt
          log_level: info
        storage:
          trace:
            backend: s3
            block:
              version: vParquet
            blocklist_poll: 5m
            cache: memcached
            local:
              path: /var/tempo/traces
            memcached:
              consistent_hash: true
              host: tempo-memcached
              service: memcached-client
              timeout: 500ms
            s3:
              ....
            wal:
              path: /var/tempo/wal
        usage_report:
          reporting_enabled: true
        "
}

btw, the README.MD mentioned to use metricsGenerator.config.storage_remote_write instead of metricsGenerator.config.storage.remote_write which is available in the values.yaml. Also, the endpoint mentioned in the README.MD is /api/v1/push while the prometheus doc mentioned /api/v1/write. Which one is the correct endpoint if we need to write the metrics to Prometheus?

https://github.com/grafana/helm-charts/tree/main/charts/tempo-distributed#activate-metrics-generator

Thank you!

dreaminghk commented 1 year ago

Hi, I find a work around for this problem. After the helm chart is deployed, we can manually modify the configMap and restart all the pods in the tempo namespace.

EStork09 commented 1 year ago

This should work just fine, what version of the chart are you deploying?

metricsGenerator:
  enabled: true
  config:
    storage:
      remote_write:
        - url: http://mimir-distributed-gateway.mimir/api/v1/push

Your "trial 2" indentation is wrong, and the key is also wrong, trial 1 should have worked but did you indent correctly?

dreaminghk commented 1 year ago

Thanks for your reply. Finally got it when I was trying to reply. There are invisible characters in the line of remote write and it is the reason.

image

I will close the thread. Thank you very much!