open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

[exporter/datadog] Datadog exporter panics while exporting metrics pushed from Kuma dataplane proxy #32103

Closed Automaat closed 1 month ago

Automaat commented 6 months ago

Component(s)

exporter/datadog

What happened?

Description

We've introduced support for pushing metrics to OpenTelemetry collector in Kuma service mesh, and we discovered an issue with datadog exporter. A couple of minutes after we start pushing metrics to collector, it panics. More information in Kuma issue, with logs from debug exporter: https://github.com/kumahq/kuma/issues/9336

Steps to Reproduce

Install Kuma, guide

helm repo add kuma https://kumahq.github.io/charts
helm repo update
helm install --create-namespace --namespace kuma-system kuma kuma/kuma

Install demo app

kumactl install demo | kubectl apply -f -

Otel collector with config:

kubectl --context $CTX_CLUSTER3 create namespace observability

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# otel collector config via helm
cat > otel-config-datadog.yaml <<EOF
mode: deployment
config:
  exporters:
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog
EOF

helm upgrade --install \
  --kube-context ${CTX_CLUSTER3} \
  -n observability \
  --set mode=deployment \
  -f otel-config-datadog.yaml \
  opentelemetry-collector open-telemetry/opentelemetry-collector

# enable Metrics
kumactl apply -f - <<EOF
type: MeshMetric
name: metrics-default
mesh: default
spec:
  targetRef:
    kind: Mesh
  default:
    backends:
    - type: OpenTelemetry
      openTelemetry: 
        endpoint: "opentelemetry-collector.observability.svc:4317"
EOF

Expected Result

exporter does not panic

Actual Result

exporter panics

panic: runtime error: index out of range [0] with length 0

goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
    github.com/DataDog/opentelemetry-mapping-go/pkg/quantile@v0.13.2/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
    github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
    github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:39 +0x7d

Collector version

0.92.0

Environment information

Environment

OpenTelemetry Collector configuration

mode: deployment
config:
  exporters:
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog

Log output

panic: runtime error: index out of range [0] with length 0

goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
    github.com/DataDog/opentelemetry-mapping-go/pkg/quantile@v0.13.2/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
    github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
    github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
    github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
    go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
    go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:39 +0x7d

Additional context

No response

github-actions[bot] commented 6 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi commented 6 months ago

@Automaat If it is easy to reproduce, would you be able to use the file exporter or the debug exporter to get some sample metrics?

It sounds like the problematic metric is an OTLP Histogram, but I don't have enough data to reproduce just yet.

Automaat commented 6 months ago

@mx-psi we have logs from debug-exporter, here: https://github.com/kumahq/kuma/issues/9336#issuecomment-1977017881 If it is not enough I can collect more

mx-psi commented 6 months ago

@Automaat These logs don't have any sample data, using one of the exporters mentioned on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32103#issuecomment-2032083548 should allow us to see the actual payload. If you have not used them before, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#local-exporters for a brief explanation

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 month ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.