signalfx / splunk-otel-collector-chart

Splunk OpenTelemetry Collector for Kubernetes
Apache License 2.0
119 stars 148 forks source link

Pods entering CrashLoopBackoff due to port conflict #841

Closed mgherman closed 1 year ago

mgherman commented 1 year ago

Describe the issue you're reporting

Hi Folks,

When trying to install this chart (version 0.80.0), i'm seeing the splunk-otel-collector-agent pods failing to start due to a port conflict on 8889/TCP due to that port already being in use on the cluster nodes.

DynEnv mgh$ kubectl -n splunk-logging logs splunk-connect-splunk-otel-collector-agent-7c572
2023/07/04 03:51:46 settings.go:373: Set config to [/conf/relay.yaml]
2023/07/04 03:51:46 settings.go:426: Set ballast to 165 MiB
2023/07/04 03:51:46 settings.go:442: Set memory limit to 450 MiB
2023-07-04T03:51:46.814Z    info    service/telemetry.go:81 Setting up own telemetry...
2023-07-04T03:51:46.814Z    info    service/telemetry.go:104    Serving Prometheus metrics  {"address": "0.0.0.0:8889", "level": "Basic"}
2023-07-04T03:51:46.815Z    info    kube/client.go:98   k8s filtering   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs", "labelSelector": "", "fieldSelector": "spec.nodeName=us-nonprod-kubewrk-01q.atl01.stelladotops.com"}
2023-07-04T03:51:46.815Z    info    memorylimiterprocessor@v0.80.0/memorylimiter.go:102 Memory limiter configured   {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 450, "spike_limit_mib": 90, "check_interval": 2}
2023-07-04T03:51:46.820Z    info    service/service.go:131  Starting otelcol... {"Version": "v0.80.0", "NumCPU": 4}
2023-07-04T03:51:46.820Z    info    extensions/extensions.go:30 Starting extensions...
2023-07-04T03:51:46.820Z    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "k8s_observer"}
2023-07-04T03:51:46.820Z    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "k8s_observer"}
2023-07-04T03:51:46.820Z    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "memory_ballast"}
2023-07-04T03:51:46.828Z    info    ballastextension@v0.80.0/memory_ballast.go:41   Setting memory ballast  {"kind": "extension", "name": "memory_ballast", "MiBs": 165}
2023-07-04T03:51:46.913Z    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "memory_ballast"}
2023-07-04T03:51:46.913Z    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-07-04T03:51:46.913Z    info    zpagesextension@v0.80.0/zpagesextension.go:53   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-07-04T03:51:46.913Z    info    zpagesextension@v0.80.0/zpagesextension.go:63   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-07-04T03:51:46.914Z    info    zpagesextension@v0.80.0/zpagesextension.go:75   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-07-04T03:51:46.914Z    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "zpages"}
2023-07-04T03:51:46.914Z    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "file_storage"}
2023-07-04T03:51:46.914Z    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "file_storage"}
2023-07-04T03:51:46.914Z    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-07-04T03:51:46.914Z    info    healthcheckextension@v0.80.0/healthcheckextension.go:34 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-07-04T03:51:46.914Z    warn    internal@v0.80.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-07-04T03:51:46.914Z    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "health_check"}
2023-07-04T03:51:46.914Z    info    internal/resourcedetection.go:125   began detecting resource information    {"kind": "processor", "name": "resourcedetection", "pipeline": "logs"}
2023-07-04T03:51:46.915Z    info    internal/resourcedetection.go:139   detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "logs", "resource": {"host.id":"","host.name":"us-nonprod-kubewrk-01q.atl01.stelladotops.com","os.type":"linux"}}
2023-07-04T03:51:46.915Z    warn    internal@v0.80.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-07-04T03:51:46.915Z    info    otlpreceiver@v0.80.0/otlp.go:83 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2023-07-04T03:51:46.915Z    warn    internal@v0.80.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-07-04T03:51:46.915Z    info    otlpreceiver@v0.80.0/otlp.go:101    Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4318"}
2023-07-04T03:51:46.915Z    info    adapter/receiver.go:45  Starting stanza receiver    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-07-04T03:51:47.010Z    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2023-07-04T03:51:47.010Z    info    service/service.go:148  Everything is ready. Begin running and processing data.
2023-07-04T03:51:47.012Z    error   otelcol/collector.go:233    Asynchronous error received, terminating process    {"error": "listen tcp 0.0.0.0:8889: bind: address already in use"}

Looking at other issues I noticed that https://github.com/signalfx/splunk-otel-collector-chart/issues/572#issuecomment-1317429619 mentions updating the values.yaml file to set an alternate port, however when attempting this

config:
  telemetry:
    metrics:
      address: 0.0.0.0:8890

helm throws an error:

Error: INSTALLATION FAILED: values don't meet the specifications of the schema(s) in the following chart(s):
splunk-otel-collector:
- (root): Additional property config is not allowed

Short of downloading and modifying the chart locally, is there a "correct" way to update the port to resolve the conflict?

omrozowicz-splunk commented 1 year ago

Hey, you should put this config under agent, like:

agent:
  config:
    telemetry:
      metrics:
        address: 0.0.0.0:8890
mgherman commented 1 year ago

Thanks @omrozowicz-splunk this lead me to a solution of:

agent:
  config:
    receivers:
      prometheus/agent:
        config:
          scrape_configs:
          - job_name: otel-agent
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${K8S_POD_IP}:8890
    service:
      telemetry:
        metrics:
          address: 0.0.0.0:8890

I appreciate your help.