open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

Error: cannot start pipelines: listen tcp 0.0.0.0:4317: bind: address already in use #27822

Closed felixgao closed 11 months ago

felixgao commented 1 year ago

Component(s)

No response

What happened?

Description

I downloaded the latest docker image and trying to bring up a collect to sent my trace/metrics over. When I tried to compose up my services the collector is failing with

2023-10-17T23:27:05.891Z    info    service@v0.87.0/service.go:178  Starting shutdown...
2023-10-17T23:27:05.891Z    info    healthcheck/handler.go:132  Health Check state change   {"kind": "extension", "name": "health_check", "status": "unavailable"}
2023-10-17T23:27:05.891Z    info    extensions/extensions.go:50 Stopping extensions...
2023-10-17T23:27:05.891Z    info    zpagesextension@v0.87.0/zpagesextension.go:98   Unregistered zPages span processor on tracer provider   {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.891Z    info    service@v0.87.0/service.go:192  Shutdown complete.
Error: cannot start pipelines: listen tcp 0.0.0.0:4317: bind: address already in use
2023/10/17 23:27:05 collector server run finished with error: cannot start pipelines: listen tcp 0.0.0.0:4317: bind: address already in use

Steps to Reproduce

docker-compose up -d collector

Expected Result

services up and running

Actual Result

container exited with error.

Collector version

208830cb18d9

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") macOS Ventura 13.5.2 Compiler(if manually compiled): (e.g., "go 14.2") Docker version: 4.23.0

OpenTelemetry Collector configuration

No response

Log output

The complete log message

❯ docker logs pdf_to_png-collector-1
2023-10-17T23:27:05.886Z    info    service@v0.87.0/telemetry.go:84 Setting up own telemetry...
2023-10-17T23:27:05.886Z    info    service@v0.87.0/telemetry.go:201    Serving Prometheus metrics  {"address": ":8888", "level": "Basic"}
2023-10-17T23:27:05.886Z    debug   extension@v0.87.0/extension.go:154  Beta component. May change in the future.   {"kind": "extension", "name": "pprof"}
2023-10-17T23:27:05.886Z    debug   extension@v0.87.0/extension.go:154  Beta component. May change in the future.   {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.886Z    debug   extension@v0.87.0/extension.go:154  Beta component. May change in the future.   {"kind": "extension", "name": "health_check"}
2023-10-17T23:27:05.886Z    info    exporter@v0.87.0/exporter.go:275    Deprecated component. Will be removed in future releases.   {"kind": "exporter", "data_type": "logs", "name": "logging"}
2023-10-17T23:27:05.889Z    debug   processor@v0.87.0/processor.go:287  Stable component.   {"kind": "processor", "name": "batch", "pipeline": "logs"}
2023-10-17T23:27:05.889Z    info    exporter@v0.87.0/exporter.go:275    Deprecated component. Will be removed in future releases.   {"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-10-17T23:27:05.889Z    debug   processor@v0.87.0/processor.go:287  Stable component.   {"kind": "processor", "name": "batch", "pipeline": "traces"}
2023-10-17T23:27:05.889Z    info    exporter@v0.87.0/exporter.go:275    Deprecated component. Will be removed in future releases.   {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2023-10-17T23:27:05.889Z    debug   receiver@v0.87.0/receiver.go:294    Stable component.   {"kind": "receiver", "name": "otlp", "data_type": "traces"}
2023-10-17T23:27:05.889Z    debug   receiver@v0.87.0/receiver.go:294    Stable component.   {"kind": "receiver", "name": "otlp", "data_type": "metrics"}
2023-10-17T23:27:05.889Z    debug   receiver@v0.87.0/receiver.go:294    Beta component. May change in the future.   {"kind": "receiver", "name": "otlp", "data_type": "logs"}
2023-10-17T23:27:05.889Z    debug   receiver@v0.87.0/receiver.go:294    Beta component. May change in the future.   {"kind": "receiver", "name": "zipkin", "data_type": "traces"}
2023-10-17T23:27:05.890Z    info    service@v0.87.0/service.go:143  Starting otelcol... {"Version": "0.87.0", "NumCPU": 5}
2023-10-17T23:27:05.890Z    info    extensions/extensions.go:33 Starting extensions...
2023-10-17T23:27:05.890Z    info    extensions/extensions.go:36 Extension is starting...    {"kind": "extension", "name": "pprof"}
2023-10-17T23:27:05.890Z    info    pprofextension@v0.87.0/pprofextension.go:60 Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2023-10-17T23:27:05.890Z    info    extensions/extensions.go:43 Extension started.  {"kind": "extension", "name": "pprof"}
2023-10-17T23:27:05.890Z    info    extensions/extensions.go:36 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.890Z    info    zpagesextension@v0.87.0/zpagesextension.go:53   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.890Z    info    zpagesextension@v0.87.0/zpagesextension.go:63   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.891Z    info    zpagesextension@v0.87.0/zpagesextension.go:75   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-10-17T23:27:05.891Z    info    extensions/extensions.go:43 Extension started.  {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.891Z    info    extensions/extensions.go:36 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-10-17T23:27:05.891Z    info    healthcheckextension@v0.87.0/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-10-17T23:27:05.891Z    warn    internal@v0.87.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-17T23:27:05.891Z    info    extensions/extensions.go:43 Extension started.  {"kind": "extension", "name": "health_check"}
2023-10-17T23:27:05.891Z    warn    internal@v0.87.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-17T23:27:05.891Z    info    zapgrpc/zapgrpc.go:178  [core] [Server #1] Server created   {"grpc_log": true}
2023-10-17T23:27:05.891Z    info    otlpreceiver@v0.87.0/otlp.go:83 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2023-10-17T23:27:05.891Z    info    service@v0.87.0/service.go:178  Starting shutdown...
2023-10-17T23:27:05.891Z    info    healthcheck/handler.go:132  Health Check state change   {"kind": "extension", "name": "health_check", "status": "unavailable"}
2023-10-17T23:27:05.891Z    info    extensions/extensions.go:50 Stopping extensions...
2023-10-17T23:27:05.891Z    info    zpagesextension@v0.87.0/zpagesextension.go:98   Unregistered zPages span processor on tracer provider   {"kind": "extension", "name": "zpages"}
2023-10-17T23:27:05.891Z    info    service@v0.87.0/service.go:192  Shutdown complete.
Error: cannot start pipelines: listen tcp 0.0.0.0:4317: bind: address already in use
2023/10/17 23:27:05 collector server run finished with error: cannot start pipelines: listen tcp 0.0.0.0:4317: bind: address already in use

lsof of listening ports

❯ sudo lsof -iTCP -sTCP:LISTEN -P -n

COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ciscod      665 root    3u  IPv4 0xfb6f91da4555b0c3      0t0  TCP 127.0.0.1:29834 (LISTEN)
vpnagentd   668 root    5u  IPv4 0xfb6f91da45559aa3      0t0  TCP 127.0.0.1:29754 (LISTEN)
OnGuardSe   670 root    6u  IPv4 0xfb6f91da4541e5b3      0t0  TCP 127.0.0.1:25428 (LISTEN)
ServiceDa   675 root    7u  IPv4 0xfb6f91da454fc483      0t0  TCP 127.0.0.1:25427 (LISTEN)
enmacos     679 root    4u  IPv4 0xfb6f91da454dbbd3      0t0  TCP *:6624 (LISTEN)
enmacos     679 root    5u  IPv6 0xfb6f91d57723065b      0t0  TCP *:6624 (LISTEN)
com.netsk   879 root   14u  IPv4 0xfb6f91da4555bbd3      0t0  TCP 127.0.0.1:57130 (LISTEN)
rapportd    880 ggao    3u  IPv4 0xfb6f91da4715a353      0t0  TCP *:56066 (LISTEN)
rapportd    880 ggao    4u  IPv6 0xfb6f91da45b5ae5b      0t0  TCP *:56066 (LISTEN)
sharingd    938 ggao   15u  IPv6 0xfb6f91da46f17e5b      0t0  TCP *:8770 (LISTEN)
dgwipd     1471 root   14u  IPv4 0xfb6f91da454fe5b3      0t0  TCP 127.0.0.1:3128 (LISTEN)
dgwipd     1471 root   15u  IPv6 0xfb6f91d57721665b      0t0  TCP [::1]:3128 (LISTEN)
Code\x20H  1826 ggao   42u  IPv4 0xfb6f91da454e86e3      0t0  TCP 127.0.0.1:49267 (LISTEN)
Code\x20H  1828 ggao   58u  IPv4 0xfb6f91da4504e5b3      0t0  TCP 127.0.0.1:49324 (LISTEN)
Code\x20H  1829 ggao   46u  IPv4 0xfb6f91da454d9aa3      0t0  TCP 127.0.0.1:49258 (LISTEN)
Code\x20H  1830 ggao   41u  IPv4 0xfb6f91da454ce353      0t0  TCP 127.0.0.1:49251 (LISTEN)
Notion     3337 ggao   28u  IPv4 0xfb6f91da45425aa3      0t0  TCP 127.0.0.1:49685 (LISTEN)
acumbrell  3408 root   52u  IPv4 0xfb6f91da455106e3      0t0  TCP 127.0.0.1:62722 (LISTEN)
acumbrell  3408 root   65u  IPv4 0xfb6f91da45fa1843      0t0  TCP 127.0.0.1:63102 (LISTEN)
acumbrell  3408 root   70u  IPv6 0xfb6f91da46f16e5b      0t0  TCP [::1]:63103 (LISTEN)
PanGPS     3486 root    7u  IPv4 0xfb6f91da454c4f93      0t0  TCP 127.0.0.1:4767 (LISTEN)
Beyond     3523 ggao   38u  IPv4 0xfb6f91da454d0483      0t0  TCP 127.0.0.1:8198 (LISTEN)
hubd       3661 root   10u  IPv4 0xfb6f91da45f925b3      0t0  TCP 127.0.0.1:7443 (LISTEN)
hubd       3661 root   16u  IPv6 0xfb6f91da45b5a65b      0t0  TCP [::1]:7443 (LISTEN)
kdc       17035 root    5u  IPv6 0xfb6f91da45b5c65b      0t0  TCP *:88 (LISTEN)
kdc       17035 root    7u  IPv4 0xfb6f91da4552ee63      0t0  TCP *:88 (LISTEN)
cupsd     17050 root    7u  IPv6 0xfb6f91da45b5b65b      0t0  TCP *:631 (LISTEN)
cupsd     17050 root    8u  IPv4 0xfb6f91da4551cf93      0t0  TCP *:631 (LISTEN)
dnscryptp 67846 root   53u  IPv4 0xfb6f91da46cb4f93      0t0  TCP 127.0.0.1:53 (LISTEN)

Additional context

docker-compose.yaml

version: '3.9'
services:
  zipkin:
    image: openzipkin/zipkin-slim
    ports:
      - "9410:9410"
      - "9411:9411"   
    network_mode: host 
  jaeger:
    image: jaegertracing/all-in-one
    environment:
      COLLECTOR_ZIPKIN_HOST_PORT: 9412
    ports:
      - "16686:16686"   # HTTP UI
      - "14268"
      - "14250"
    network_mode: host
  collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: [ "--config=/etc/otel-collector-config.yml" ]
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "1888:1888"   # pprof extension
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
      # - "9411"        # Zipkin receiver
      # - "4318:4318"   # OTLP/HTTP receiver
      - "55679:55679" # zpages extension
      - "4317:4317"   # OTLP gRPC receiver
      # - "55680:55680"   # OTLP over gRPC (legacy)
      - "55681:55681"   # OTLP over HTTP (legacy)
    depends_on:
      - jaeger
      - zipkin
    network_mode: host

otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
      http:
  zipkin:

exporters:
  zipkin:
    endpoint: "http://zipkin:9411/api/v2/spans"
  logging:
    verbosity: detailed

processors:
  batch:

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [pprof, zpages, health_check]
  telemetry:
    logs:
      level: "debug"
  pipelines:
    traces:
      receivers: [otlp, zipkin]
      exporters: [logging]
      processors: [batch]
    metrics:
      receivers: [otlp]
      exporters: [logging]
    logs:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ logging ]
JaredTan95 commented 1 year ago

@felixgao Have you tried use non-host network in your docker-compose yaml?

felixgao commented 1 year ago

I have not, since I am not building a docker image for my application that uses OTEL yet, I thought it would be good to setup the OTEL infrastructure locally first with docker first. That is why I want to use the host network.

codeboten commented 1 year ago

you can try binding the otlp ports to a different port. i believe jaeger is now also binding to those ports, which would be causing the problem you're seeing.

felixgao commented 1 year ago

you can try binding the otlp ports to a different port. i believe jaeger is now also binding to those ports, which would be causing the problem you're seeing.

I think the problem is not my host OTLP port, it is the the port inside the docker container.
I have changed my host port to

Update: I have removed the network_mode: host and service is up and running without any errors.

codeboten commented 1 year ago

Right the configuration I was talking about is the collector configuration. The change would be to go from

receivers:
  otlp:
    protocols:
      grpc:
      http:
  zipkin:

to something like this, i picked arbitrary ports here

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:9317
      http:
        endpoint: localhost:9318
  zipkin:

The docker compose port configuration doesn't change what port the service is binding to.

codeboten commented 11 months ago

Marking this as closed based on the comment

Update: I have removed the network_mode: host and service is up and running without any errors.

Please re-open if that's not the case.

StMakhov commented 1 month ago

Unfortunately, I have the same problem and removing network_mode: host from docker-compose.yaml didn't solve the problem for me. Could you please clarify what else might help?

I tried to receive data in OpenTelemetry collector and export it to jaeger-all-in-one. As a source of telemetry I used telemetrygen from official documentation example . I tried to start components from docker compose and using binaries, the error is the same (commands below are listed only for the second case ).

Config.yaml for the collector:

user@host:~$ cat /etc/otelcol/config.yaml 

extensions:
  health_check: {}
  pprof: {}
  zpages: {}

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http: 
        endpoint: localhost:4318
  zipkin: {}

processors:
  attributes:
    actions:
      - key: some.ip
        value: "some_IP_in_future"
        action: insert
  batch: {}

exporters:
  debug:
    verbosity: detailed
  otlp:
    endpoint: jaeger-all-in-one:4317
    tls:
      insecure: true
  zipkin:
    endpoint: "http://zipkin:9411/api/v2/spans"

service:
  pipelines:
    traces:
      receivers: [otlp, zipkin]
      processors: [attributes, batch]
      exporters: [debug, otlp]
    metrics:
      receivers: [otlp]
      exporters: [debug, otlp]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
  extensions: [pprof, zpages, health_check]
  telemetry:
    logs:
      level: "debug"

Starting the collector:

user@host:~$ /usr/bin/otelcol --config=file:/etc/otelcol/config.yaml --feature-gates=-component.UseLocalHostAsDefaultHost

Starting the load:

$GOBIN/telemetrygen traces --otlp-insecure --traces 3

I see log messages in the terminal with OpenTelemetry collector from telemetrygen, however, jaeger-all-in-one fails with an error when I try to start it:

user@host:~/pkg/jaeger-1.61.0-linux-amd64$ ./jaeger-all-in-one --collector.otlp.enabled=true
...
could not start OTLP receiver: could not start the OTLP receiver: listen tcp :4317: bind: address already in use

OpenTelemetry collector also sends an error to its terminal:

Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = name resolver error: produced zero addresses", "interval": "13.56862226s"}

If I don't add any endpoints both to receivers::otlp::protocols::grpc and receivers::otlp::protocols::http, my OpenTelemetry collector instance doesn't receive data from the load, it is sent directly to Jaeger Collector and further to Jaeger UI from it, so the OpenTelemetry collector is just ignored in this case.