open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.92k stars 2.28k forks source link

spanmetrics connector generating extreme grpc traffic #20306

Closed devrimdemiroz closed 1 year ago

devrimdemiroz commented 1 year ago

Component(s)

connector/spanmetrics

What happened?

Description

I replaced the spanmetrics processor config on opentelemetry demo app with the new spanmetrics connector. The otlp grpc receiver observed traffic increased almost 10,000 times. Accordingly calls (previously calls_total) and related spanmetrics also linearly explode. See the screenshots at the bottom.

Steps to Reproduce

Following configuration is used in replacement for spanmetrics processor:

connectors:
  spanmetrics:
      histogram:
        explicit:
          buckets: [ 100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms ]
      dimensions:
        - name: http.method
          default: GET
        - name: http.status_code
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
....

service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

Expected Result

The expected result is to be inline with spanmetrics processor runs.

When processor runs:

SpanmetricsProcessor

Actual Result

When connector runs:

SpanmetricsConnector

Collector version

0.74.0

Environment information

Environment

Images

IMAGE_VERSION=1.3.1 IMAGE_NAME=ghcr.io/open-telemetry/demo

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:
        cors:
          allowed_origins:
            - "http://*"
            - "https://*"
exporters:
  otlp:
    endpoint: "localhost:4317"
    tls:
      insecure: true
  logging:
  prometheus:
    endpoint: "otelcol:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true
connectors:
  spanmetrics:
      histogram:
        explicit:
          buckets: [ 100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms ]
      dimensions:
        - name: http.method
          default: GET
        - name: http.status_code
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

processors:
  batch:
  transform:
    metric_statements:
      - context: metric
        statements:
          - set(description, "Measures the duration of inbound HTTP requests") where name == "http.server.duration"

service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [transform, batch]
      exporters: [prometheus]

Log output

No response

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

kovrus commented 1 year ago

@devrimdemiroz The reason is https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19216, spanmetrics generates metrics from spans per resource scope now so the number of generated replicas will grow with the number of resource scopes. I've opened a PR to toggle on/off this functionality or filter resource attributes https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19467 but we decided to close it because this can be achieved with the transform processor's keep_keys function.

devrimdemiroz commented 1 year ago

@kovrus, I truly appreciate your quick response! If you could provide me with a little bit more on the transform processor configuration I need to add, you'll be an absolute time-saver for me. Thanks in advance!

kovrus commented 1 year ago

@devrimdemiroz something like that will reduce the number of resource scopes to the number of the services that produce telemetry. If we want to allow the old behavior, one resource scope for everything, we should wrap https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19467 up.

...

processors:
  transform:
    trace_statements:
    - context: resource
      statements:
      - keep_keys(attributes, ["service.name"])

...
service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      processors: [transform]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]
...
devrimdemiroz commented 1 year ago

@kovrus, thank you for sharing the precise configuration; it works perfectly. However, I'm unsure if it's absolutely necessary or not. My goal is to create a more straightforward and comprehensible configuration using the new connector config. To achieve this, I've had to add a layer that I haven't used or been familiar with before, which wasn't required by the previous processor. I'm not questioning the importance or potential benefits it may offer; I'm merely curious about the rationale behind some extra lines that aren't immediately clear. Nevertheless, I would recommend including it as part of the default spanmetrics connector config in the documentation. Since the transform config works, I'll consider this matter resolved. Thanks for your time.

kovrus commented 1 year ago

@devrimdemiroz yes, we should add a more comprehensive readme for the span metrics connector and its differences from the processor. I've tried to call out that more metrics will be generated when using the connector here, but we probably should provide a better explanation.

The transform processor with keep_keys controls the number of generated metrics resource scopes. There definitely will be cases when resource attributes will have high cardinality and that will result in more metrics generated. I agree that it is not evident from the documentation.

@djaglowski I think, we should probably revisit https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19467 and allow users to control what attributes are going to be added to generated metrics resource scopes. Maybe, by default, we can keep resource service.name, service.namespace, and service.isntance.id attributes that would define generated metrics resource scopes (wdyt @gouthamve)? We can use keep_keys for that but then dimensions configuration parameter of the spanmetrics won't work, since resource attributes will be effected by keep_keys.

djaglowski commented 1 year ago

My only concern is that we may find ourselves needing to add more and more "transform" capabilities to this connector as well as others. However, if emitting consolidated metrics based on resource attributes appears to be a particularly common case, then I support it.