Compression_error and crashing

jparrattwork commented 1 year ago

Component(s)

No response

What happened?

Description

We have an application sending OTLP data to the collector via gRPC. We then send that from the collector to our Splunk backend. The collector seems to stop sending any logs/traces/metrics and both the collector and our application eventually crash.

Steps to Reproduce

Send OTLP via gRPC to the collector

Expected Result

The collector produces no warnings and doesn't crash

Actual Result

The collector prints the following warning over and over warn zapgrpc/zapgrpc.go:195 [transport] transport: http2Server.HandleStreams failed to read frame: connection error: COMPRESSION_ERROR {"grpc_log": true}

and then eventually crashes

Collector version

v0.68.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Windows Server 2019 Compiler(if manually compiled): go version go1.19.5 windows/amd64

OpenTelemetry Collector configuration

extensions:
  # Enables health check endpoint for otel collector - https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/healthcheckextension
  health_check:
  # Opens up zpages for dev/debugging - https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension
  zpages:
    endpoint: localhost:55679

receivers:
  # For dotnet apps
  otlp:
    protocols:
      grpc:
      http:

  # FluentD
  fluentforward:
    endpoint: 0.0.0.0:8006

  # Otel Internal Metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  # System Metrics
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:  

processors:
  batch: # Batches data when sending
  resourcedetection:
    detectors: [gce, ecs, ec2, azure, system]
    timeout: 2s
    override: false
  transform/body-empty:
    log_statements:
      - context: log
        statements:
          - set(body, "body-empty") where body == nil
  groupbyattrs:
    keys:
    - service.name
    - service.version
    - host.name
  # Enabling the memory_limiter is strongly recommended for every pipeline.
  # Configuration is based on the amount of memory allocated to the collector.
  # For more information about memory limiter, see
  # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiter/README.md
  memory_limiter:
    check_interval: 2s
    limit_mib: 256              

exporters:
  splunk_hec/logs:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""

  splunk_hec/traces:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""

  splunk_hec/metrics:
    token: hidden
    endpoint: hidden
    index: hidden
    max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""      

service:
  # zpages port : 55679

  pipelines:
    logs:
      receivers: [otlp, fluentforward]
      processors: [resourcedetection, transform/body-empty, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/logs]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/metrics]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/traces]

Log output

1.6879002944284434e+09  warn    zapgrpc/zapgrpc.go:195  [transport] transport: http2Server.HandleStreams failed to read frame: connection error: COMPRESSION_ERROR  {"grpc_log": true}

Additional context

No response

atoulme commented 1 year ago

Please try the latest release. Which distribution are you using? Which OS are you running on?

Please see here to elevate log level and troubleshoot further: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md

github-actions[bot] commented 7 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

atoulme commented 7 months ago

This might relate to https://github.com/open-telemetry/opentelemetry-collector/pull/9022. I will transfer the issue over.

open-telemetry / opentelemetry-collector