open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

Healthcheck path overriden by `UseLocalHostAsDefaultHost` feature #35193

Closed TRAD-Anthony-CKO closed 1 month ago

TRAD-Anthony-CKO commented 1 month ago

Component(s)

extension/healthcheck

What happened?

Description

User defined endpoint for healthchecks is being overriden from 0.0.0.0 to localhost. Probably linked to the recent change of moving from 0.0.0.0 to localhost. Linked to this issue.

Steps to Reproduce

Start latest version of collector (0.109.0) with this simplified config:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  otlp:
    endpoint: "${COLLECTOR_GATEWAY_ENDPOINT}"
    tls:
      insecure: true

processors:

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

service:
  extensions: [health_check]
  telemetry:
    logs:
      level: "debug"
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      exporters: [otlp]
    traces:
      receivers: [otlp]
      exporters: [otlp]

Expected Result

Healthcheck endpoint started at 0.0.0.0

Actual Result

2024-09-14T08:11:23.913Z info healthcheckextension@v0.106.1/healthcheckextension.go:32 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"localhost:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}} 2024-09-14T08:11:23.914Z info extensions/extensions.go:56 Extension started. {"kind": "extension", "name": "health_check"} 2024-09-14T08:11:23.914Z info zapgrpc/zapgrpc.go:176 [core] [Server #1]Server created {"grpc_log": true} 2024-09-14T08:11:23.914Z info otlpreceiver@v0.106.1/otlp.go:102 Starting GRPC server {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:55680"} 2024-09-14T08:11:23.914Z info otlpreceiver@v0.106.1/otlp.go:152 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:55681"} 2024-09-14T08:11:23.914Z info healthcheck/handler.go:132 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"} 2024-09-14T08:11:23.914Z info service@v0.106.1/service.go:225 Everything is ready. Begin running and processing data. 2024-09-14T08:11:23.914Z info localhostgate/featuregate.go:63 The default endpoints for all servers in components have changed to use localhost instead of 0.0.0.0. Disable the feature gate to temporarily revert to the previous default. {"feature gate ID": "component.UseLocalHostAsDefaultHost"} 2024-09-14T08:11:23.914Z info zapgrpc/zapgrpc.go:176 [core] [Server #1 ListenSocket #2]ListenSocket created {"grpc_log": true}

Collector version

0.109.0

Environment information

Environment

Docker on Mac M1

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  otlp:
    endpoint: "${COLLECTOR_GATEWAY_ENDPOINT}"
    tls:
      insecure: true

processors:

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

service:
  extensions: [health_check]
  telemetry:
    logs:
      level: "debug"
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      exporters: [otlp]
    traces:
      receivers: [otlp]
      exporters: [otlp]

Log output

2024-09-14T08:11:23.913Z        info    healthcheckextension@v0.106.1/healthcheckextension.go:32        Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"localhost:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-09-14T08:11:23.914Z        info    extensions/extensions.go:56     Extension started.      {"kind": "extension", "name": "health_check"}
2024-09-14T08:11:23.914Z        info    zapgrpc/zapgrpc.go:176  [core] [Server #1]Server created        {"grpc_log": true}
2024-09-14T08:11:23.914Z        info    otlpreceiver@v0.106.1/otlp.go:102       Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:55680"}
2024-09-14T08:11:23.914Z        info    otlpreceiver@v0.106.1/otlp.go:152       Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:55681"}
2024-09-14T08:11:23.914Z        info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-09-14T08:11:23.914Z        info    service@v0.106.1/service.go:225 Everything is ready. Begin running and processing data.
2024-09-14T08:11:23.914Z        info    localhostgate/featuregate.go:63 The default endpoints for all servers in components have changed to use localhost instead of 0.0.0.0. Disable the feature gate to temporarily revert to the previous default.     {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-09-14T08:11:23.914Z        info    zapgrpc/zapgrpc.go:176  [core] [Server #1 ListenSocket #2]ListenSocket created  {"grpc_log": true}

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

jpkrohling commented 1 month ago

There's something odd with your Collector version:

2024-09-14T08:11:23.913Z        info    healthcheckextension@v0.106.1/healthcheckextension.go:32        Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"localhost:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}

It says that the health check extension is at 0.106.1? In any case, I tried this with both the latest main and with 0.108.1, and both got me the correct results:

2024-09-18T13:11:09.482+0200    info    service@v0.108.1/service.go:178 Setting up own telemetry...
2024-09-18T13:11:09.482+0200    info    service@v0.108.1/telemetry.go:98        Serving metrics {"address": ":8888", "metrics level": "Normal"}
2024-09-18T13:11:09.482+0200    info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-09-18T13:11:09.483+0200    info    service@v0.108.1/service.go:263 Starting otelcol-contrib...     {"Version": "0.108.1", "NumCPU": 16}
2024-09-18T13:11:09.483+0200    info    extensions/extensions.go:38     Starting extensions...
2024-09-18T13:11:09.483+0200    info    extensions/extensions.go:41     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-09-18T13:11:09.483+0200    info    healthcheckextension@v0.108.0/healthcheckextension.go:33        Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-09-18T13:11:09.483+0200    info    extensions/extensions.go:58     Extension started.      {"kind": "extension", "name": "health_check"}
2024-09-18T13:11:09.483+0200    info    otlpreceiver@v0.108.1/otlp.go:103       Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-09-18T13:11:09.483+0200    info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-09-18T13:11:09.483+0200    info    service@v0.108.1/service.go:289 Everything is ready. Begin running and processing data.
2024-09-18T13:11:09.483+0200    info    localhostgate/featuregate.go:63 The default endpoints for all servers in components have changed to use localhost instead of 0.0.0.0. Disable the feature gate to temporarily revert to the previous default.   {"feature gate ID": "component.UseLocalHostAsDefaultHost"}

And this is the positive confirmation:

> curl 192.168.2.179:13133
{"status":"Server available","upSince":"2024-09-18T13:11:09.48349904+02:00","uptime":"1m52.29823269s"}

The config I used was very similar to yours:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  debug:

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug]
TRAD-Anthony-CKO commented 1 month ago

I saw the same log, but I thought that the healthcheck might be versioned differently? I'm explicitly pulling the 0.109.0 in my DockerImage (a tiny wrapper around the base image):

FROM    otel/opentelemetry-collector-contrib:0.109.0
COPY    collector/otel-collector-config.yaml /etc/otel-collector-config.yaml
EXPOSE  55680 55681 13133 4317 4318

Inspected the running container as well, i see those variables which suggests the container itself is on 109.0:

"org.opencontainers.image.name": "opentelemetry-collector-releases",
            "org.opencontainers.image.revision": "b07bcb3f966e134245b9879f8e8b5948a44bfc9f",
            "org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
            "org.opencontainers.image.version": "0.109.0"

Can you think of anything that might make the healthcheck extension stay on a specific version?

TRAD-Anthony-CKO commented 1 month ago

Closing the issue since it seems Docker a caching issue on my machine, I've tested on a fresh machine the same setup and it works as you mentioned. Apologies!