open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

Attributes Processor not deleting all attributes #21923

Closed martinrw closed 1 year ago

martinrw commented 1 year ago

Component(s)

processor/attributes

What happened?

Description

The metrics that we see in prometheus have a lot of attributes / labels associated with them, mostly these don't add anything useful so I am trying to exclude them so we don't see them in prometheus or grafana.

We are attempting to use the attributes processor (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md) to remove some of these attributes that the open telemetry collector appears to be adding to our metrics by default.

It seems to be able to delete the attributes that have names like telemetry.sdk.name or telemetry.sdk.version (where there are multiple words separated by "." ) but not the single word attributes like "instance" or "container"

Steps to Reproduce

We are using the Otel agents for both Java (springboot) and python They are sending metrics using automatic instrumentation to our otel collector (config in the section below) The metrics are being scraped by prometheus

This is how a metric looked before we removed any attributes (with some identifying info redacted):

prometheus_http_server_duration_sum{container="opentelemetry-collector", container_id="acbdefghijklmnop", endpoint="app-metrics", exported_job="our-service", host_arch="amd64", host_name="our-service-abc-123", http_method="GET", http_route="/**", http_scheme="https", http_status_code="404", instance="1.2.3.4:9464", job="opentelemetry-collector", label_team="vhs", namespace="opentelemetry-collector", net_host_name="our-service.something.com", net_protocol_name="http", net_protocol_version="1.1", os_description="Linux 5.15.90", os_type="linux", pod="opentelemetry-collector-954d8dfdd-gm5k2", process_command_args="["/opt/java/openjdk/bin/java","-XX:MaxRAMPercentage=75","-XX:ActiveProcessorCount=2","-javaagent:./opentelemetry-javaagent/opentelemetry-javaagent.jar","-Dotel.service.name=our-service","-Dotel.exporter.otlp.metrics.endpoint=http://opentelemetry-collector.opentelemetry-collector.svc.cluster.local:4317","-Dotel.exporter.otlp.metrics.protocol=grpc","-Dotel.traces.exporter=none","-Dotel.resource.attributes=label_team=our-team","-Dnewrelic.environment=quality","-jar","./our-service.jar"]", process_executable_path="/opt/java/openjdk/bin/java", process_pid="1", service="opentelemetry-collector", service_name="our-service", telemetry_auto_version="1.25.1", telemetry_sdk_language="java", telemetry_sdk_version="1.25.0"}

Expected Result

We should see all of the fields included in our config removed:

prometheus_http_server_duration_sum{exported_job="our-service", host_arch="amd64", host_name="our-service-abc-123", http_method="GET", http_route="/**", http_scheme="https", http_status_code="404", label_team="our-team",, net_host_name="our-service.something.com", net_protocol_name="http", net_protocol_version="1.1", os_description="Linux 5.15.90", os_type="linux" , process_command_args="["/opt/java/openjdk/bin/java","-XX:MaxRAMPercentage=75","-XX:ActiveProcessorCount=2","-javaagent:./opentelemetry-javaagent/opentelemetry-javaagent.jar","-Dotel.service.name=our-service","-Dotel.exporter.otlp.metrics.endpoint=http://opentelemetry-collector.opentelemetry-collector.svc.cluster.local:4317","-Dotel.exporter.otlp.metrics.protocol=grpc","-Dotel.traces.exporter=none","-Dotel.resource.attributes=label_team=our-team","-jar","./our-service.jar"]", process_executable_path="/opt/java/openjdk/bin/java", process_pid="1", service_name="our-service"}

Actual Result

Some of the attributes like:

telemetry_auto_version="1.25.1", telemetry_sdk_language="java", telemetry_sdk_version="1.25.0"

are no longer there But we still have some of the unwanted attributes like container, instance, pod etc

prometheus_http_server_duration_sum{container="opentelemetry-collector", endpoint="app-metrics", exported_job="our-service", host_arch="amd64", host_name="our-service-abc-123", http_method="GET", http_route="/**", http_scheme="https", http_status_code="404", instance="1.2.3.4:9464", job="opentelemetry-collector", label_team="our-team", namespace="opentelemetry-collector", net_host_name="our-service.something.com", net_protocol_name="http", net_protocol_version="1.1", os_description="Linux 5.15.90", os_type="linux", pod="opentelemetry-collector-954d8dfdd-gm5k2", process_command_args="["/opt/java/openjdk/bin/java","-XX:MaxRAMPercentage=75","-XX:ActiveProcessorCount=2","-javaagent:./opentelemetry-javaagent/opentelemetry-javaagent.jar","-Dotel.service.name=our-service","-Dotel.exporter.otlp.metrics.endpoint=http://opentelemetry-collector.opentelemetry-collector.svc.cluster.local:4317","-Dotel.exporter.otlp.metrics.protocol=grpc","-Dotel.traces.exporter=none","-Dotel.resource.attributes=label_team=our-team","-jar","./our-service.jar"]", process_executable_path="/opt/java/openjdk/bin/java", process_pid="1", service="opentelemetry-collector", service_name="our-service", }

Collector version

0.72.0

Environment information

Environment

EKS, K8s version 1.24

OpenTelemetry Collector configuration

config:
  exporters:
    prometheus:
      endpoint: "0.0.0.0:9464"
      resource_to_telemetry_conversion:
        enabled: true
      enable_open_metrics: true
      namespace: prometheus
  extensions:
    # The health_check extension is mandatory for this chart.
    # Without the health_check extension the collector will fail the readiness and liveliness probes.
    # The health_check extension can be modified, but should never be removed.
    health_check: {}
    zpages: {}
    pprof: {}
  processors:
    memory_limiter:
      limit_mib: 4000
      spike_limit_mib: 800
      check_interval: 1s
    batch:
      send_batch_size: 10000
      send_batch_max_size: 11000
      timeout: 10s
    #Delete unnecessary attributes from our metrics
    resource:
      attributes:
        - key: telemetry.sdk.name
          action: delete
        - key: telemetry.sdk.version
          action: delete
        - key: telemetry.sdk.language
          action: delete
        - key: telemetry.auto.version
          action: delete
        - key: job
          action: delete
        - key: service
          action: delete
        - key: container
          action: delete
        - key: container.id
          action: delete
        - key: endpoint
          action: delete
        - key: namespace
          action: delete
        - key: prometheus
          action: delete
        - key: instance
          action: delete
    k8sattributes/default:
  receivers:
    jaeger: null
    prometheus: null
    zipkin: null
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
  service:
    pipelines:
      traces: null
      logs: null
      metrics:
        exporters:
          - prometheus
        processors:
          - memory_limiter
          - batch
          - resource
          - k8sattributes/default
        receivers:
          - otlp

Log output

no relevant logs

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme commented 1 year ago

Your configuration doesn't mention the attributesprocessor. It uses the resourcesprocessor. Please check and provide a reproduction scenario.

martinrw commented 1 year ago

Thanks for looking. What's the difference between them both? I attempted using attributes instead of resources but I couldn't make it delete any attributes at all that way. I can have another attempt when I'm back at my computer tomorrow

atoulme commented 1 year ago

attributesprocessor removes attributes of individual log/metric/span records, while resourceprocessor will remote attributes of the resource that is wrapping individual records. Please see the documentation associated with the components for more information.

martinrw commented 1 year ago

Hi @atoulme Thanks again for taking the time to reply. I have read the docs from here: https://opentelemetry.io/docs/collector/transforming-telemetry/#adding-or-deleting-attributes and https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor but to be honest I still don't really understand the difference.

I have updated my config to add a section for attributes like this:

    resource:
      attributes:
        - key: telemetry.sdk.name
          action: delete
        - key: telemetry.sdk.version
          action: delete
        - key: telemetry.sdk.language
          action: delete
        - key: telemetry.auto.version
          action: delete
    attributes:
      actions:
        - key: job
          action: delete
        - key: service
          action: delete
        - key: container
          action: delete
        - key: endpoint
          action: delete
        - key: namespace
          action: delete
        - key: prometheus
          action: delete
        - key: instance
          action: delete
    k8sattributes/default:
  receivers:
    jaeger: null
    prometheus: null
    zipkin: null
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
  service:
    pipelines:
      traces: null
      logs: null
      metrics:
        exporters:
          - prometheus
        processors:
          - memory_limiter
          - batch
          - resource
          - attributes
          - k8sattributes/default
        receivers:

But it hasn't helped unfortunately. I still see just these attributes being removed: telemetry.sdk.name, telemetry.sdk.version

and not these ones: pod, container, instance etc

martinrw commented 1 year ago

I have more-or-less figured this out now... The attributes that I am trying to delete aren't actually created by the opentelemetry collector (or agent)... they are added by prometheus as part of the scrape config.

I was able to get rid of them by setting this config in my open telemetry collector:

serviceMonitor:
  enabled: true
  metricsEndpoints:
    - port: metrics
      interval: 15s
  prometheusMetricsEndpoints:
    - port: app-metrics
      interval: 15s
      honorLabels: true
      relabelings:
        - action: labeldrop
          regex: (container|endpoint|job|namespace|pod|service)
        - action: replace
          regex: (.*)
          replacement: otel-collector
          targetLabel: instance

For whatever reason the instance label wouldn't drop so I just set it to be a consistent value instead (previously it was the IP of the otel-collector pod)

Apologies for raising a bug incorrectly but hopefully this helps someone else in the future