signalfx / splunk-otel-collector-chart

Splunk OpenTelemetry Collector for Kubernetes
Apache License 2.0
119 stars 148 forks source link

Does not work via RKE2 #926

Closed SitronNO closed 1 year ago

SitronNO commented 1 year ago

What happened?

Description

Installed Splunk OpenTelemetry Collector for Kubernetes via helm chart, but the Pods will not start, and exists with an error

Steps to Reproduce

  1. Install rke2 on Rocky Linux, via their tutorial: https://docs.rke2.io/install/quickstart
  2. Create new namespace (splunk-otel) and change context to use the new namespace (kubectl config set-context --current --namespace=splunk-otel)
  3. Create the file splunk-otel.yml (see content below)
  4. Installed with the command helm -n splunk-otel install my-splunk-otel-collector -f splunk-otel.yml splunk-otel-collector-chart/splunk-otel-collector

Expected Result

See the Pod's running and logs in Splunk

Actual Result

Both Pod's are crashing:

$ kubectl get pod
NAME                                   READY   STATUS             RESTARTS       AGE
my-splunk-otel-collector-agent-7nbrh   0/1     CrashLoopBackOff   24 (75s ago)   99m
my-splunk-otel-collector-agent-mjmv5   0/1     CrashLoopBackOff   25 (95s ago)   99m

See logs below.

As a test, I chmod'ed /var/addon on both my nodes and all subdirectories to 0777, deleted the pods, but it did not help.

Chart version

0.83.0

Environment information

Environment

Cloud: Non, on-prem install of RKE2 k8s version: v1.25.13+rke2r1 OS: Rocky Linux 9.2

Chart configuration

clusterName: k8s.example.org
splunkPlatform:
  token: 48d04097-abba-beef-cake-ddaf697d19af
  endpoint: https://splunk.example.org:8088/services/collector
  index: "kubernetes"
  insecureSkipVerify: true
logsEngine: otel
cloudProvider: ""
distribution: ""

Log output

$ kubectl logs my-splunk-otel-collector-agent-7nbrh
2023/09/14 12:02:53 settings.go:399: Set config to [/conf/relay.yaml]
2023/09/14 12:02:53 settings.go:452: Set ballast to 165 MiB
2023/09/14 12:02:53 settings.go:468: Set memory limit to 450 MiB
2023-09-14T12:02:53.707Z    info    service/telemetry.go:84 Setting up own telemetry...
2023-09-14T12:02:53.707Z    info    service/telemetry.go:201    Serving Prometheus metrics  {"address": "0.0.0.0:8889", "level": "Basic"}
2023-09-14T12:02:53.708Z    info    kube/client.go:107  k8s filtering   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs", "labelSelector": "", "fieldSelector": "spec.nodeName=rke2-02"}
2023-09-14T12:02:53.708Z    info    memorylimiterprocessor@v0.83.0/memorylimiter.go:102 Memory limiter configured   {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 450, "spike_limit_mib": 90, "check_interval": 2}
2023-09-14T12:02:53.709Z    info    service/service.go:138  Starting otelcol... {"Version": "v0.83.0", "NumCPU": 2}
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:31 Starting extensions...
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:34 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-09-14T12:02:53.709Z    info    healthcheckextension@v0.83.0/healthcheckextension.go:34 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-09-14T12:02:53.709Z    warn    internal@v0.83.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:38 Extension started.  {"kind": "extension", "name": "health_check"}
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:34 Extension is starting...    {"kind": "extension", "name": "k8s_observer"}
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:38 Extension started.  {"kind": "extension", "name": "k8s_observer"}
2023-09-14T12:02:53.709Z    info    extensions/extensions.go:34 Extension is starting...    {"kind": "extension", "name": "memory_ballast"}
2023-09-14T12:02:53.711Z    info    ballastextension@v0.83.0/memory_ballast.go:41   Setting memory ballast  {"kind": "extension", "name": "memory_ballast", "MiBs": 165}
2023-09-14T12:02:53.711Z    info    extensions/extensions.go:38 Extension started.  {"kind": "extension", "name": "memory_ballast"}
2023-09-14T12:02:53.711Z    info    extensions/extensions.go:34 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-09-14T12:02:53.711Z    info    zpagesextension@v0.83.0/zpagesextension.go:53   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-09-14T12:02:53.711Z    info    zpagesextension@v0.83.0/zpagesextension.go:63   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-09-14T12:02:53.712Z    info    zpagesextension@v0.83.0/zpagesextension.go:75   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-09-14T12:02:53.712Z    info    extensions/extensions.go:38 Extension started.  {"kind": "extension", "name": "zpages"}
2023-09-14T12:02:53.712Z    info    extensions/extensions.go:34 Extension is starting...    {"kind": "extension", "name": "file_storage"}
2023-09-14T12:02:53.713Z    info    extensions/extensions.go:38 Extension started.  {"kind": "extension", "name": "file_storage"}
2023-09-14T12:02:53.716Z    info    internal/resourcedetection.go:125   began detecting resource information    {"kind": "processor", "name": "resourcedetection", "pipeline": "logs"}
2023-09-14T12:02:53.762Z    info    internal/resourcedetection.go:139   detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "logs", "resource": {"host.name":"rke2-02.sysrq.info","os.type":"linux"}}
2023-09-14T12:02:53.762Z    info    adapter/receiver.go:45  Starting stanza receiver    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-09-14T12:02:53.762Z    info    service/service.go:170  Starting shutdown...
2023-09-14T12:02:53.762Z    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "unavailable"}
2023-09-14T12:02:53.762Z    info    adapter/receiver.go:139 Stopping stanza receiver    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-09-14T12:02:53.805Z    info    extensions/extensions.go:45 Stopping extensions...
2023-09-14T12:02:53.806Z    info    zpagesextension@v0.83.0/zpagesextension.go:98   Unregistered zPages span processor on tracer provider   {"kind": "extension", "name": "zpages"}
2023-09-14T12:02:53.806Z    info    service/service.go:184  Shutdown complete.
Error: cannot start pipelines: storage client: open /var/addon/splunk/otel_pos/receiver_filelog_: permission denied
2023/09/14 12:02:53 main.go:94: application run finished with error: cannot start pipelines: storage client: open /var/addon/splunk/otel_pos/receiver_filelog_: permission denied

Additional context

No response

atoulme commented 1 year ago

Please check the permission of the /var/addon folder. It appears you have a typo and worked with /var/addons.

SitronNO commented 1 year ago

Please check the permission of the /var/addon folder. It appears you have a typo and worked with /var/addons.

Sorry, a typo from me. This is the permissions on both nodes in the cluster:

$ ls -ld /var/ /var/addon/ /var/addon/splunk/ /var/addon/splunk/otel_pos/ 
drwxr-xr-x. 20 root root 4096 Aug 30 08:56 /var/
drwxrwxrwx.  3 root root   20 Aug 30 08:56 /var/addon/
drwxrwxrwx.  3 root root   22 Aug 30 08:56 /var/addon/splunk/
drwxrwxrwx.  2 root root    6 Aug 30 08:56 /var/addon/splunk/otel_pos/
omrozowicz-splunk commented 1 year ago

Hey, Just to be sure - /var/addon/splunk/otel_pos/receiver_filelog_ specifically have the same permissions as well?

atoulme commented 1 year ago

rke2 is not currently officially supported. We can take the time to reproduce and work with you towards support. Please file a Splunk Idea at https://ideas.splunk.com and contact your account representative to get started.

bordenit commented 4 months ago

@SitronNO All I had to do to make it work with RKE2 is add privileged=true to the securityContext. I'll have to make a patch file though as it seems you can't add additional securityContext options to the values file without the helm update erroring out.