open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.05k stars 2.35k forks source link

why does the target_info still have values after a pod is evicted ? #34378

Open wangjinxiang0522 opened 3 months ago

wangjinxiang0522 commented 3 months ago

Component(s)

receiver/prometheus exporter/prometheusremotewrite

Describe the issue you're reporting

    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 10s
              static_configs:
                - targets: ['0.0.0.0:8888']
        target_allocator:
          endpoint: http://mc-collector-ta-targetallocator
          interval: 30s
          collector_id: "${POD_NAME}"
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
    exporters:
      logging:
        #loglevel: debug
      prometheusremotewrite:
        endpoint: http://mimir-nginx.monitoring.svc:80/api/v1/push
        tls:
          insecure: true
        resource_to_telemetry_conversion:
          enabled: false

Actual Result

screenshot-20240801-191511

github-actions[bot] commented 3 months ago

Pinging code owners:

dashpole commented 3 months ago

Does it stick around for about 5 mintues? If so, this sounds like we are missing staleness markers when a pod is evicted.

wangjinxiang0522 commented 3 months ago

Does it stick around for about 5 mintues? If so, this sounds like we are missing staleness markers when a pod is evicted.

@dashpole Yes,thanks for your reply, how should I modify the parameters to solve this issue?

dashpole commented 3 months ago

Does this happen only when a pod is evicted? Or also when a pod is deleted?

wangjinxiang0522 commented 2 months ago

Does this happen only when a pod is evicted? Or also when a pod is deleted?

Yes, it happens when a pod is evicted or deleted.

dashpole commented 2 months ago

my best guess is that when we apply the new config to the service manager and discovery manager, it removes the targets without generating a staleness marker. But fixing it will probably be a change in the prometheus server (promethues/prometheus). We need to reproduce this with the prometheus server (updating the config file to remove a static target), and see if the series is marked stale (if the line correctly stops) in the graph.

github-actions[bot] commented 1 week ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.