open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.94k stars 1.32k forks source link

Collector cannot export metrics telemetry in an ipv6-only environment #10011

Open lpetrazickisupgrade opened 4 weeks ago

lpetrazickisupgrade commented 4 weeks ago

Describe the bug

  1. Expose telemetry metrics for OpenTelemetry Collector with a correctly escaped ipv6 ip address
  2. Collector unescapes the ip address and naively concatenates it with the port number
  3. Too many colons error

Steps to reproduce

  1. Delimit the ipv6 address with square brackets:
    service:
    telemetry:
    logs:
      encoding: json
    metrics:
      address: '[${env:MY_POD_IP}]:8888'
  2. Deploy config to an ipv6-only environment
  3. listen tcp: address dead:beef:dead:beef:dead::beef:8888: too many colons in address

What did you expect to see? Metrics on port 8888

What did you see instead?

{
  "level": "error",
  "ts": 1713554862.2179377,
  "caller": "otelcol@v0.98.0/collector.go:275",
  "msg": "Asynchronous error received, terminating process",
  "error": "listen tcp: address dead:beef:dead:beef:dead::beef:8888: too many colons in address",
  "stacktrace": "
go.opentelemetry.io/collector/otelcol.(*Collector).Run
    go.opentelemetry.io/collector/otelcol@v0.98.0/collector.go:275
go.opentelemetry.io/collector/otelcol.NewCommand.func1
    go.opentelemetry.io/collector/otelcol@v0.98.0/command.go:35
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.8.0/command.go:1039
main.runInteractive
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main.go:27
main.run
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main_others.go:10
main.main
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main.go:20
runtime.main
    runtime/proc.go:271"
}

What version did you use? v0.98.0

What config did you use?

service:
  telemetry:
    logs:
      encoding: json
    metrics:
      address: '[${env:MY_POD_IP}]:8888'

Environment helm.sh/chart: opentelemetry-collector-0.87.2 Image: opentelemetry-collector-contrib:0.98.0 Kubernetes: v1.29.1-eks-b9c9ed7

Additional context This is a regression. v0.79.0 did not have this issue

TylerHelmuth commented 4 weeks ago

@lpetrazickisupgrade I am curious if the issue is with the collector serving the metrics or the prometheus receiver scrapping. Can you reproduce the issue without a prometheus receiver trying to scrape?

TylerHelmuth commented 4 weeks ago

Most likely though this is a bug from switching to using the OTel Go SDK instead of opencensus.

/cc @codeboten

lpetrazickisupgrade commented 4 weeks ago

@TylerHelmuth Thanks for taking a look! I think the OpenTelemetry Collector process is crashing at startup parsing the config. The pod is in a CrashLoopBackOff. It doesn't get far enough in the startup sequence to respond to network requests. I've included the only log message

I think the regression may have been introduced by this PR: https://github.com/open-telemetry/opentelemetry-collector/pull/9632/files

Because the otlp exporter reuses the grpc client config: https://github.com/open-telemetry/opentelemetry-collector/blame/v0.98.0/exporter/otlpexporter/config.go#L25