open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.03k stars 2.35k forks source link

GCP Trace exporter failing due to "invalid utf-8" #35723

Open ethanmdavidson opened 2 weeks ago

ethanmdavidson commented 2 weeks ago

Component(s)

exporter/googlecloud

What happened?

Description

While using the googlecloud exporter to forward traces to GCP Tracing, I occasionally get the following error:

{
  "dropped_items": 728,
  "error": "rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8",
  "caller": "exporterhelper/queue_sender.go:92",
  "ts": 1728481957.2171862,
  "kind": "exporter",
  "data_type": "traces",
  "level": "error",
  "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/exporterhelper/queue_sender.go:92\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/internal/queue/consumers.go:43",
  "msg": "Exporting failed. Dropping data.",
  "name": "googlecloud"
}

Steps to Reproduce

Unclear - I haven't been able to figure out what trace data is triggering this condition. The only related issue appears to be https://github.com/open-telemetry/opentelemetry-go/issues/3021 which was fixed almost 2 years ago (though unclear to me if the googlecloud exporter depends on opentelemetry-go)

Expected Result

Export all spans regardless of whether or not they are well-formed. Alternately, drop only the spans that are malformed, so that the rest of the batch remains intact.

Actual Result

A whole batch of spans is being dropped, presumably due to a few malformed spans.

Collector version

v0.108.0

Environment information

Environment

otel operator v0.108.0 GKE 1.30.4-gke.1348000

OpenTelemetry Collector configuration

connectors:
      datadog/connector: {}
    exporters:
      datadog:
        api:
          key: ${env:DD_API_KEY}
          site: <snip>
      debug: {}
      googlecloud:
        timeout: 60s
    processors:
      batch: {}
      batch/datadog:
        send_batch_max_size: 100
        send_batch_size: 10
        timeout: 5s
      filter/datadog:
        error_mode: ignore
        metrics:
          metric:
          - resource.attributes["service.name"] != "monolith"
        traces:
          span:
          - resource.attributes["service.name"] != "monolith"
      filter/monitoring:
        error_mode: ignore
        traces:
          span: <snip>
      memory_limiter:
        check_interval: 1s
        limit_percentage: 60
        spike_limit_percentage: 30
      resourcedetection:
        detectors:
        - env
        - gcp
        override: false
        timeout: 5s
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    service:
      pipelines:
        metrics/datadog:
          exporters:
          - datadog
          processors:
          - filter/datadog
          - batch/datadog
          receivers:
          - datadog/connector
        traces:
          exporters:
          - googlecloud
          - datadog/connector
          processors:
          - memory_limiter
          - filter/monitoring
          - batch
          - resourcedetection
          receivers:
          - otlp
        traces/datadog:
          exporters:
          - datadog
          processors:
          - filter/datadog
          - batch/datadog
          receivers:
          - datadog/connector
      telemetry:
        logs:
          encoding: json
          level: INFO

Log output

{
  "textPayload": "2024/10/09 13:52:37 failed to export to Google Cloud Trace: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8",
},
{
  "jsonPayload": {
    "dropped_items": 728,
    "error": "rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8",
    "caller": "exporterhelper/queue_sender.go:92",
    "ts": 1728481957.2171862,
    "kind": "exporter",
    "data_type": "traces",
    "level": "error",
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/exporterhelper/queue_sender.go:92\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.108.1/internal/queue/consumers.go:43",
    "msg": "Exporting failed. Dropping data.",
    "name": "googlecloud"
  },
}

Additional context

No response

github-actions[bot] commented 2 weeks ago

Pinging code owners:

dashpole commented 2 weeks ago

Our collector exporter uses the SDK exporter under the hood, but that shouldn't actually matter. We probably want to make the trace client remove invalid UTF-8 similar to what we do for metrics: https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/16bba4f4e879814de7d3354ef83bcfd597e44b15/exporter/metric/metric.go#L583. It probably needs to be done somewhere around here: https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/16bba4f4e879814de7d3354ef83bcfd597e44b15/exporter/trace/trace_proto.go#L168

dashpole commented 2 weeks ago

Opened https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/issues/901 to track this in that repo