open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.37k forks source link

[exporter/datadog] Trace stats are severely underreporting hit counts #31713

Open sirianni opened 8 months ago

sirianni commented 8 months ago

Component(s)

exporter/datadog

What happened?

Description

The trace.* stats calculated within the datadogexporter are severely underreporting hit counts in many cases.

We confirmed this by comparing the span counts

image

Strangely, this discrepancy is not observed for a few HTTP routes (e.g. /metrics/logical_clusters in the above screenshot).

We do not have sampling enabled anywhere in our OTel Collector configuration.

Collector version

v0.93.0

Environment information

Environment

Kubernetes

OpenTelemetry Collector configuration

exporters:
  datadog
    metrics:
      resource_attributes_as_tags: true
      instrumentation_scope_metadata_as_tags: true
      summaries:
        mode: noquantiles
    traces:
      compute_stats_by_span_kind: false
      peer_tags_aggregation: false
      trace_buffer: 1000

    host_metadata:
      enabled: false
    sending_queue:
      queue_size: 200

Log output

No response

Additional context

No response

sirianni commented 8 months ago

@mx-psi @dineshg13 @mackjmr FYI

github-actions[bot] commented 8 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

backjo commented 7 months ago

I thinks this is probably related to #31219 (and a bug). If you're not using the connector at all and just the exporter, the DD exporter is sending Datadog-Client-Computed-Stats: true as a header which is telling DD that the APM stats have already been computed - which is only true if you are using the connector.

github-actions[bot] commented 5 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dineshg13 commented 5 months ago

@sirianni are you still facing the issue? can you please us know if you were able to try the latest version of connector.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

lucassantoss1701 commented 2 months ago

I’m facing a similar issue in an application, where there’s a discrepancy between the counters of metrics generated from span tags and the metrics automatically generated by the otelhttp library.

For example, the API receives 1,000 hits within a certain period, and during execution, it consistently makes calls to another API.

When making these calls, we generate a client-type span, which leads to Datadog creating a client APM metric. This allows us to filter and monitor the application's dependencies. However, the value of this metric significantly differs from the metrics generated by otelhttp, causing inconsistencies in the data we're monitoring.

I’m not sure if this is related to the warning in the Datadog documentation that mentions metrics from traces can indeed have inconsistencies.

github-actions[bot] commented 2 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.