signalfx / splunk-otel-collector-chart

Splunk OpenTelemetry Collector for Kubernetes
Apache License 2.0
119 stars 148 forks source link

otel-collector container in splunk-otel-collector pod crashes in GKE #152

Closed maulikp-splunk closed 3 years ago

maulikp-splunk commented 3 years ago

Hello team,

I am using helm version splunk-otel-collector-0.26.1 on GKE (1.19.9-gke.1400) and noticed that splunk-otel-collector pod is in CrashLoopBackOff state.

Here are log details from otel-collector container in splunk-otel-collector pod:

2021/05/26 18:34:28 main.go:223: Set config to /conf/relay.yaml
2021/05/26 18:34:28 main.go:250: Set ballast to 168 MiB
2021/05/26 18:34:28 main.go:279: Set memory limit to 460 MiB
2021-05-26T18:34:28.939Z        info    service/application.go:281      Starting otelcol...     {"Version": "v0.26.0", "NumCPU": 2}
2021-05-26T18:34:29.041Z        info    service/application.go:341      Using memory ballast    {"MiBs": 168}
2021-05-26T18:34:29.041Z        info    service/application.go:189      Setting up own telemetry...
2021-05-26T18:34:29.042Z        info    service/telemetry.go:98 Serving Prometheus metrics      {"address": "0.0.0.0:8888", "level": 0, "service.instance.id": "3c7c313e-e843-4d94-ae72-072bbabbf87a"}
2021-05-26T18:34:29.042Z        info    service/application.go:224      Loading configuration...
2021-05-26T18:34:29.136Z        info    service/application.go:240      Applying configuration...
2021-05-26T18:34:29.139Z        info    signalfxexporter@v0.26.0/factory.go:87  Correlation tracking enabled    {"kind": "exporter", "name": "signalfx", "endpoint": "https://api.us1.signalfx.com"}
2021-05-26T18:34:29.142Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "signalfx"}
2021-05-26T18:34:29.142Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "splunk_hec"}
2021-05-26T18:34:29.142Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "sapm"}
2021-05-26T18:34:29.144Z        info    kube/client.go:87       k8s filtering   {"kind": "processor", "name": "k8s_tagger", "labelSelector": "", "fieldSelector": "spec.nodeName=gke-gke-mp-helm-test-default-pool-5625eb55-mm4z"}
2021-05-26T18:34:29.144Z        info    memorylimiter/memorylimiter.go:105      Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "limit_mib": 482344960, "spike_limit_mib": 96468992, "check_interval": 5}
2021-05-26T18:34:29.144Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-05-26T18:34:29.145Z        info    memorylimiter/memorylimiter.go:105      Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "limit_mib": 482344960, "spike_limit_mib": 96468992, "check_interval": 5}
2021-05-26T18:34:29.145Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.145Z        info    memorylimiter/memorylimiter.go:105      Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "limit_mib": 482344960, "spike_limit_mib": 96468992, "check_interval": 5}
2021-05-26T18:34:29.145Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "metrics/agent", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.145Z        info    memorylimiter/memorylimiter.go:105      Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "limit_mib": 482344960, "spike_limit_mib": 96468992, "check_interval": 5}
2021-05-26T18:34:29.145Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-05-26T18:34:29.146Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "zipkin", "datatype": "traces"}
2021-05-26T18:34:29.146Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "signalfx", "datatype": "metrics"}
2021-05-26T18:34:29.147Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "smartagent/signalfx-forwarder", "datatype": "traces"}
2021-05-26T18:34:29.218Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "otlp", "datatype": "traces"}
2021-05-26T18:34:29.218Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "jaeger", "datatype": "traces"}
2021-05-26T18:34:29.218Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "fluentforward", "datatype": "logs"}
2021-05-26T18:34:29.218Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "hostmetrics", "datatype": "metrics"}
2021-05-26T18:34:29.218Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "receiver_creator", "datatype": "metrics"}
2021-05-26T18:34:29.329Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "kubeletstats", "datatype": "metrics"}
2021-05-26T18:34:29.330Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "prometheus/agent", "datatype": "metrics"}
2021-05-26T18:34:29.330Z        info    builder/receivers_builder.go:105        Ignoring receiver as it is not used by any pipeline     {"kind": "receiver", "name": "sapm"}
2021-05-26T18:34:29.330Z        info    service/service.go:155  Starting extensions...
2021-05-26T18:34:29.330Z        info    builder/extensions_builder.go:53        Extension is starting...        {"kind": "extension", "name": "health_check"}
2021-05-26T18:34:29.330Z        info    healthcheckextension/healthcheckextension.go:41 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"}}}
2021-05-26T18:34:29.330Z        info    builder/extensions_builder.go:59        Extension started.      {"kind": "extension", "name": "health_check"}
2021-05-26T18:34:29.331Z        info    builder/extensions_builder.go:53        Extension is starting...        {"kind": "extension", "name": "k8s_observer"}
2021-05-26T18:34:29.333Z        info    builder/extensions_builder.go:59        Extension started.      {"kind": "extension", "name": "k8s_observer"}
2021-05-26T18:34:29.333Z        info    builder/extensions_builder.go:53        Extension is starting...        {"kind": "extension", "name": "zpages"}
2021-05-26T18:34:29.333Z        info    zpagesextension/zpagesextension.go:42   Register Host's zPages  {"kind": "extension", "name": "zpages"}
2021-05-26T18:34:29.334Z        info    zpagesextension/zpagesextension.go:55   Starting zPages extension       {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2021-05-26T18:34:29.334Z        info    builder/extensions_builder.go:59        Extension started.      {"kind": "extension", "name": "zpages"}
2021-05-26T18:34:29.334Z        info    service/service.go:200  Starting exporters...
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "signalfx"}
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "signalfx"}
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "splunk_hec"}
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "splunk_hec"}
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "sapm"}
2021-05-26T18:34:29.334Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "sapm"}
2021-05-26T18:34:29.334Z        info    service/service.go:205  Starting processors...
2021-05-26T18:34:29.334Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-05-26T18:34:29.334Z        info    internal/resourcedetection.go:123       began detecting resource information    {"kind": "processor", "name": "resourcedetection"}
2021-05-26T18:34:29.422Z        info    internal/resourcedetection.go:135       detected resource information   {"kind": "processor", "name": "resourcedetection", "resource": {"cloud.account.id":"playground-s-11-ceec2464","cloud.availability_zone":"us-central1-c","cloud.platform":"gcp_gke","cloud.provider":"gcp","host.id":"9174059746695785472","host.name":"gke-gke-mp-helm-test-default-pool-5625eb55-mm4z.us-central1-c.c.playground-s-11-ceec2464.internal","host.type":"projects/365485357438/machineTypes/e2-standard-2","k8s.cluster.name":"gke-mp-helm-test","os.type":"LINUX"}}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics/agent", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "metrics/agent", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-05-26T18:34:29.422Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-05-26T18:34:29.422Z        info    service/service.go:210  Starting receivers...
2021-05-26T18:34:29.422Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "smartagent/signalfx-forwarder"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "smartagent/signalfx-forwarder"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "jaeger"}
2021-05-26T18:34:29.424Z        info    static/strategy_store.go:201    No sampling strategies provided or URL is unavailable, using defaults   {"kind": "receiver", "name": "jaeger"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "jaeger"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "fluentforward"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "fluentforward"}
2021-05-26T18:34:29.424Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "hostmetrics"}
2021-05-26T18:34:29.425Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "hostmetrics"}
2021-05-26T18:34:29.425Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "kubeletstats"}
2021-05-26T18:34:29.425Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "kubeletstats"}
2021-05-26T18:34:29.425Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "prometheus/agent"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "prometheus/agent"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "zipkin"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "zipkin"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "signalfx"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "signalfx"}
2021-05-26T18:34:29.543Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.543Z        info    otlpreceiver/otlp.go:87 Starting GRPC server on endpoint 0.0.0.0:4317   {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.543Z        info    otlpreceiver/otlp.go:149        Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680      {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.543Z        info    otlpreceiver/otlp.go:87 Starting GRPC server on endpoint 0.0.0.0:55680  {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.544Z        info    otlpreceiver/otlp.go:105        Starting HTTP server on endpoint 0.0.0.0:55681  {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.544Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "otlp"}
2021-05-26T18:34:29.544Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "receiver_creator"}
2021-05-26T18:34:29.544Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "receiver_creator"}
2021-05-26T18:34:29.544Z        info    healthcheck/handler.go:128      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2021-05-26T18:34:29.544Z        info    service/application.go:201      Everything is ready. Begin running and processing data.
2021-05-26T18:34:29.544Z        error   service/application.go:212      Asynchronous error received, terminating process        {"error": "listen tcp 0.0.0.0:8888: bind: address already in use"}
go.opentelemetry.io/collector/service.(*Application).runAndWaitForShutdownEvent
        /home/circleci/go/pkg/mod/go.opentelemetry.io/collector@v0.26.0/service/application.go:212
go.opentelemetry.io/collector/service.(*Application).execute
        /home/circleci/go/pkg/mod/go.opentelemetry.io/collector@v0.26.0/service/application.go:304
go.opentelemetry.io/collector/service.New.func1
        /home/circleci/go/pkg/mod/go.opentelemetry.io/collector@v0.26.0/service/application.go:118
github.com/spf13/cobra.(*Command).execute
        /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
        /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960
github.com/spf13/cobra.(*Command).Execute
        /home/circleci/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
go.opentelemetry.io/collector/service.(*Application).Run
        /home/circleci/go/pkg/mod/go.opentelemetry.io/collector@v0.26.0/service/application.go:157
main.runInteractive
        /home/circleci/project/cmd/otelcol/main.go:288
main.run
        /home/circleci/project/cmd/otelcol/main_others.go:23
main.main
        /home/circleci/project/cmd/otelcol/main.go:85
runtime.main
        /usr/local/go/src/runtime/proc.go:225
2021-05-26T18:34:29.545Z        info    service/application.go:311      Starting shutdown...
2021-05-26T18:34:29.545Z        info    healthcheck/handler.go:128      Health Check state change       {"kind": "extension", "name": "health_check", "status": "unavailable"}
2021-05-26T18:34:29.545Z        info    service/service.go:225  Stopping receivers...
2021-05-26T18:34:29.545Z        info    service/application.go:268      Config WatchForUpdate closed    {"error": "parent session was closed"}
2021-05-26T18:34:29.548Z        info    service/service.go:231  Stopping processors...
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:70 Pipeline is shutting down...    {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:76 Pipeline is shutdown.   {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:70 Pipeline is shutting down...    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:76 Pipeline is shutdown.   {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:70 Pipeline is shutting down...    {"pipeline_name": "metrics/agent", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:76 Pipeline is shutdown.   {"pipeline_name": "metrics/agent", "pipeline_datatype": "metrics"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:70 Pipeline is shutting down...    {"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-05-26T18:34:29.548Z        info    builder/pipelines_builder.go:76 Pipeline is shutdown.   {"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-05-26T18:34:29.549Z        info    service/service.go:237  Stopping exporters...
2021-05-26T18:34:29.549Z        info    service/service.go:164  Stopping extensions...
2021-05-26T18:34:29.550Z        info    service/application.go:329      Shutdown complete.
dmitryax commented 3 years ago

Hi @maulikp-splunk ,

Deployment of otel-collector was failing because Google added their own build of otel-collector for metrics collection to GKE nodes by default and there was a conflict on port 8888 for internal collector's metrics collection.

Google switched their agent from 8888 to 8200, and the agent should be automatically upgraded by now. So if you try to install the splunk-otel-collector again there shouldn't be any issues.

Please let me know how it goes.

maulikp-splunk commented 3 years ago

Thank you @dmitryax !! You are the best as always !! I tried installing it today, and works perfectly fine!