splunk / splunk-connect-for-snmp

Splunk connect for SNMP
https://splunk.github.io/splunk-connect-for-snmp/
Apache License 2.0
34 stars 15 forks source link

snmp-mongodb and snmp-redis-master in CrashLoopBackOff status #894

Closed thasteve closed 3 months ago

thasteve commented 10 months ago

Issue during initial configuration. Followed steps to run microk8s helm3 upgrade --install snmp -f values.yaml splunk-connect-for-snmp/splunk-connect-for-snmp --namespace=sc4snmp --create-namespace After running microk8s kubectl get pods -n sc4snmp to verify deployment I see snmp-redis-master-0 and snmp-mongodb-... are in a CrashLoopBackOff status.

I'm running on an ESXi hosted Rocky linux VM with plenty of resources.

ajasnosz commented 10 months ago

Could you share the output of microk8s kubectl describe pod <pod-name> -n sc4snmp and events for both failing pods. What is the version of sc4snmp you are trying tu run?

thasteve commented 10 months ago

When I search repo snmp I see the following app version. This was what I redeployed yesterday trying to troubleshoot the issue.

NAME CHART VERSION APP VERSION DESCRIPTION splunk-connect-for-snmp/splunk-connect-for-snmp 1.9.2 1.9.2 A Helm chart for SNMP Connect for SNMP

Below is the output microk8s kubectl describe pod <pod-name> -n sc4snmp. Edited the Node name and IP for security.

snmp-redis-master-0

Name: snmp-redis-master-0 Namespace: sc4snmp Priority: 0 Service Account: snmp-redis Node: snmp01.domain.com/192.168.1.2 Start Time: Thu, 19 Oct 2023 10:03:11 -0400 Labels: app.kubernetes.io/component=master app.kubernetes.io/instance=snmp app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=redis controller-revision-hash=snmp-redis-master-54465fc56 helm.sh/chart=redis-17.3.18 statefulset.kubernetes.io/pod-name=snmp-redis-master-0 Annotations: checksum/configmap: 04422870eebf6e73b372f1816da4f48d5d9a753f31c07f4e8decf26858647c5e checksum/health: 230c16035014813c1ed5dca4b334fceead271bc2437e3adc38ba319bfa89ad67 checksum/scripts: 50226e5366a7aaef5c150dc915b32e209d9248fdf6ca19b9e2517edebe8aa072 checksum/secret: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 cni.projectcalico.org/containerID: a08e8a7e22b5416faf0e4db68bde97043cb51596f2eeac34327a0e74930aae83 cni.projectcalico.org/podIP: 10.1.89.19/32 cni.projectcalico.org/podIPs: 10.1.89.19/32 Status: Running IP: 10.1.89.19 IPs: IP: 10.1.89.19 Controlled By: StatefulSet/snmp-redis-master Containers: redis: Container ID: containerd://f4744cfba8a15bfe85c910c3adb5d2a1b6c13fbbd66105ae944a1f1c0006e991 Image: docker.io/bitnami/redis:7.0.7-debian-11-r2 Image ID: docker.io/bitnami/redis@sha256:5481f3ce531dd4d756806491ef911c23eda0636dd9568eb654fbba4c6a854a9e Port: 6379/TCP Host Port: 0/TCP Command: /bin/bash Args: -c /opt/bitnami/scripts/start-scripts/start-master.sh State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Fri, 20 Oct 2023 09:29:52 -0400 Finished: Fri, 20 Oct 2023 09:29:52 -0400 Ready: False Restart Count: 278 Liveness: exec [sh -c /health/ping_liveness_local.sh 5] delay=20s timeout=6s period=5s #success=1 #failure=5 Readiness: exec [sh -c /health/ping_readiness_local.sh 1] delay=20s timeout=2s period=5s #success=1 #failure=5 Environment: BITNAMI_DEBUG: false REDIS_REPLICATION_MODE: master ALLOW_EMPTY_PASSWORD: yes REDIS_TLS_ENABLED: no REDIS_PORT: 6379 Mounts: /data from redis-data (rw) /health from health (rw) /opt/bitnami/redis/etc/ from redis-tmp-conf (rw) /opt/bitnami/redis/mounted-etc from config (rw) /opt/bitnami/scripts/start-scripts from start-scripts (rw) /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kx7wn (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: redis-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: redis-data-snmp-redis-master-0 ReadOnly: false start-scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: snmp-redis-scripts Optional: false health: Type: ConfigMap (a volume populated by a ConfigMap) Name: snmp-redis-health Optional: false config: Type: ConfigMap (a volume populated by a ConfigMap) Name: snmp-redis-configuration Optional: false redis-tmp-conf: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: kube-api-access-kx7wn: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message Warning BackOff 72s (x6937 over 23h) kubelet Back-off restarting failed container

snmp-mongodb-75b89b595f-qftj9

Name: snmp-mongodb-75b89b595f-qftj9 Namespace: sc4snmp Priority: 0 Service Account: snmp-mongodb Node: snmp01.domain.com/192.168.1.2 Start Time: Thu, 19 Oct 2023 10:03:11 -0400 Labels: app.kubernetes.io/component=mongodb app.kubernetes.io/instance=snmp app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=mongodb helm.sh/chart=mongodb-12.1.31 pod-template-hash=75b89b595f Annotations: cni.projectcalico.org/containerID: f21b10d2a2c3e790bc3600f23bca0d6357dd0d8e51ce0832631807ce4c025622 cni.projectcalico.org/podIP: 10.1.89.18/32 cni.projectcalico.org/podIPs: 10.1.89.18/32 Status: Running IP: 10.1.89.18 IPs: IP: 10.1.89.18 Controlled By: ReplicaSet/snmp-mongodb-75b89b595f Init Containers: volume-permissions: Container ID: containerd://9c7f0325183523ffdb3da1b859050630ffbe573ea07d6c3f0d71acf369368ccf Image: docker.io/bitnami/bitnami-shell:11-debian-11-r21 Image ID: docker.io/bitnami/bitnami-shell@sha256:d05ec18b29aed67267a0a9c2c64c02594e6aa5791ccac2b7b1f5bab3f7ff7851 Port: Host Port: Command: /bin/bash Args: -ec mkdir -p /bitnami/mongodb/ chown 1001:1001 /bitnami/mongodb/ find /bitnami/mongodb/ -mindepth 1 -maxdepth 1 -not -name ".snapshot" -not -name "lost+found" | xargs -r chown -R 1001:1001

State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 19 Oct 2023 10:08:50 -0400 Finished: Thu, 19 Oct 2023 10:08:50 -0400 Ready: True Restart Count: 0 Environment: Mounts: /bitnami/mongodb from datadir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro) Containers: mongodb: Container ID: containerd://0276593d96780786602b4bf6e8729ee68b8063ca61c8c7e13b37e1e4cb7b8e0b Image: docker.io/bitnami/mongodb:5.0.10-debian-11-r3 Image ID: docker.io/bitnami/mongodb@sha256:563e1572db6c23a7bc5d8970d4cf06de1f1a80bd41c4b5e273a92bfa9f26d0f1 Port: 27017/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 132 Started: Fri, 20 Oct 2023 09:35:44 -0400 Finished: Fri, 20 Oct 2023 09:35:44 -0400 Ready: False Restart Count: 279 Liveness: exec [/bitnami/scripts/ping-mongodb.sh] delay=30s timeout=10s period=20s #success=1 #failure=6 Readiness: exec [/bitnami/scripts/readiness-probe.sh] delay=5s timeout=5s period=10s #success=1 #failure=6 Environment: BITNAMI_DEBUG: false ALLOW_EMPTY_PASSWORD: yes MONGODB_SYSTEM_LOG_VERBOSITY: 0 MONGODB_DISABLE_SYSTEM_LOG: no MONGODB_DISABLE_JAVASCRIPT: no MONGODB_ENABLE_JOURNAL: yes MONGODB_PORT_NUMBER: 27017 MONGODB_ENABLE_IPV6: no MONGODB_ENABLE_DIRECTORY_PER_DB: no Mounts: /bitnami/mongodb from datadir (rw) /bitnami/scripts from common-scripts (rw) /docker-entrypoint-initdb.d from custom-init-scripts (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro) metrics: Container ID: containerd://a3a3b7e866af19c0e90adafb4c2584cbabb83f3c97b17a6b554bfae6bfd3ceac Image: docker.io/bitnami/mongodb-exporter:0.33.0-debian-11-r9 Image ID: docker.io/bitnami/mongodb-exporter@sha256:078725e342e6c77343e121c1dc784a1bb38c38516814ab79ca8853a1385188c0 Port: 9216/TCP Host Port: 0/TCP Command: /bin/bash -ec Args: /bin/mongodb_exporter --collect-all --compatible-mode --web.listen-address ":9216" --mongodb.uri "mongodb://localhost:27017/admin?"

State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Fri, 20 Oct 2023 09:34:32 -0400 Finished: Fri, 20 Oct 2023 09:35:02 -0400 Ready: False Restart Count: 461 Liveness: http-get http://:metrics/metrics delay=15s timeout=5s period=5s #success=1 #failure=3 Readiness: http-get http://:metrics/metrics delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: common-scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: snmp-mongodb-common-scripts Optional: false custom-init-scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: snmp-mongodb-init-scripts Optional: false datadir: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: snmp-mongodb ReadOnly: false kube-api-access-8wx9z: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message Warning BackOff 61s (x8196 over 23h) kubelet Back-off restarting failed container

ajasnosz commented 10 months ago

Can you reinstall the sc4snmp and then collect the logs and pvc information for redis and mongo with commands:

microk8s kubectl logs -f <pod-name>  -n sc4snmp
microk8s kubectl get pvc  -n sc4snmp
microk8s kubectl pvc/<pvc-name>  -n sc4snmp
thasteve commented 10 months ago

Got it.

I uninstalled microk8s to remove all of the pods. I'm following this guide for installation - https://splunk.github.io/splunk-connect-for-snmp/main/gettingstarted/sc4snmp-installation/

Here are the requested outputs after reinstalling -

microk8s kubectl logs -f snmp-redis-master-0 -n sc4snmp

1:C 23 Oct 2023 16:39:46.377 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 23 Oct 2023 16:39:46.377 # Redis version=7.0.7, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 23 Oct 2023 16:39:46.377 # Configuration loaded 1:M 23 Oct 2023 16:39:46.378 monotonic clock: POSIX clock_gettime 1:M 23 Oct 2023 16:39:46.378 Running mode=standalone, port=6379. 1:M 23 Oct 2023 16:39:46.378 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 23 Oct 2023 16:39:46.378 # Server initialized 1:M 23 Oct 2023 16:39:46.379 # Can't open or create append-only dir appendonlydir: Permission denied

microk8s kubectl logs -f snmp-mongodb-75b89b595f-fbhbq -n sc4snmp

Defaulted container "mongodb" out of: mongodb, metrics, volume-permissions (init) mongodb 16:58:06.38 mongodb 16:58:06.38 Welcome to the Bitnami mongodb container mongodb 16:58:06.38 Subscribe to project updates by watching https://github.com/bitnami/containers mongodb 16:58:06.38 Submit issues and feature requests at https://github.com/bitnami/containers/issues mongodb 16:58:06.38 mongodb 16:58:06.39 INFO ==> Starting MongoDB setup mongodb 16:58:06.40 INFO ==> Validating settings in MONGODB_* env vars... mongodb 16:58:06.66 WARN ==> You set the environment variable ALLOW_EMPTY_PASSWORD=yes. For safety reasons, do not use this flag in a production environment. mongodb 16:58:06.68 INFO ==> Initializing MongoDB... mongodb 16:58:06.70 INFO ==> Deploying MongoDB from scratch... /opt/bitnami/scripts/libos.sh: line 336: 46 Illegal instruction (core dumped) "$@" > /dev/null 2>&1

microk8s kubectl get pvc -n sc4snmp

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE snmp-mongodb Bound pvc-d6013ea1-49bc-4cb8-a508-2f32cf90f5ba 8Gi RWO microk8s-hostpath 67m redis-data-snmp-redis-master-0 Bound pvc-5bbde51a-75a1-4ff1-a8ee-0322df00447d 8Gi RWO microk8s-hostpath 67m

microk8s kubectl describe pvc/redis-data-snmp-redis-master-0 -n sc4snmp

Name: redis-data-snmp-redis-master-0 Namespace: sc4snmp StorageClass: microk8s-hostpath Status: Bound Volume: pvc-5bbde51a-75a1-4ff1-a8ee-0322df00447d Labels: app.kubernetes.io/component=master app.kubernetes.io/instance=snmp app.kubernetes.io/name=redis Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: microk8s.io/hostpath volume.kubernetes.io/selected-node: snmp01.domain.com volume.kubernetes.io/storage-provisioner: microk8s.io/hostpath Finalizers: [kubernetes.io/pvc-protection] Capacity: 8Gi Access Modes: RWO VolumeMode: Filesystem Used By: snmp-redis-master-0 Events:

microk8s kubectl describe pvc/snmp-mongodb -n sc4snmp

Name: snmp-mongodb Namespace: sc4snmp StorageClass: microk8s-hostpath Status: Bound Volume: pvc-d6013ea1-49bc-4cb8-a508-2f32cf90f5ba Labels: app.kubernetes.io/component=mongodb app.kubernetes.io/instance=snmp app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=mongodb helm.sh/chart=mongodb-12.1.31 Annotations: meta.helm.sh/release-name: snmp meta.helm.sh/release-namespace: sc4snmp pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: microk8s.io/hostpath volume.kubernetes.io/selected-node: snmp01.domain.com volume.kubernetes.io/storage-provisioner: microk8s.io/hostpath Finalizers: [kubernetes.io/pvc-protection] Capacity: 8Gi Access Modes: RWO VolumeMode: Filesystem Used By: snmp-mongodb-75b89b595f-fbhbq Events:

ajasnosz commented 10 months ago

From what I saw about redis it's about permission and in this issue it's mentioned that it can be solved by enabling volumePermissions. To do it you can add it into yours value.yaml like this:

redis:
  volumePermissions: 
    enabled: true

And reinstall the snmp, you can uninstall with the mentioned link.

If that doesn't work you can look at those links for more informations: https://stackoverflow.com/questions/55201167/redis-service-fails-with-permission-denied-on-append-file https://github.com/helm/charts/issues/5041

About mongodb, I found that mongo above version 5 requires to run on cpu with avx. Please check if your environment supports that, as we are currently running mongodb in v6 in sc4snmp. To check if your cpu is supporing avx you can use commands: lscpu | grep avx or cat /proc/cpuinfo and search for avx inside flags section You can see those issues for more information: https://github.com/bitnami/charts/issues/10255 https://github.com/bitnami/charts/issues/12834

thasteve commented 10 months ago

From what I saw about redis it's about permission and in this https://github.com/bitnami/charts/issues/14327 it's mentioned that it can be solved by enabling volumePermissions. To do it you can add it into yours value.yaml like this:

redis:
  volumePermissions: 
    enabled: true

This worked. The redis pod is now running. I'm working with the VI team to understand why my virtual device doesn't have an AVX flag from the CPU. I'll update the issue when I hear back from them on a solution.

Thanks.

thasteve commented 10 months ago

About mongodb, I found that mongo above version 5 requires to run on cpu with avx. Please check if your environment supports that, as we are currently running mongodb in v6 in sc4snmp. To check if your cpu is supporing avx you can use commands: lscpu | grep avx or cat /proc/cpuinfo and search for avx inside flags section You can see those issues for more information: https://github.com/bitnami/charts/issues/10255 https://github.com/bitnami/charts/issues/12834

Is there a way or can you provide any information on downgrading the MongoDB package to version 4? It appears that was the solution for a lot of individuals that were not able to provide AVX support for a guest device. In our case the virtualization platform will not pass the AVX instruction set from the VI host to the guest.

If it turns out we can't use an older MongoDB version we may have to scrap this sc4snmp project.

ajasnosz commented 10 months ago

First option is to run the last version of sc4snmp that used mongo v4, which is 1.8.4.

I'm not sure if the latest version of code will be compatible with the older mongo but you can try to update it. To do it you have to download the repository. Go to dir: cd splunk-connect-for-snmp/charts/splunk-connect-for-snmp/ Update Chart.yaml, last bitnami version running mongodb was 11.1.10

dependencies:
  - name: mongodb
    version: ~11.1.10

Run: microk8s helm3 dep update Go back to to directory with sc4snmp repository. Then to load new values to sc4snmp run command: microk8s helm3 upgrade --install snmp -f values.yaml ~/splunk-connect-for-snmp/charts/splunk-connect-for-snmp/ --namespace=sc4snmp --create-namespace

thasteve commented 10 months ago

First option is to run the last version of sc4snmp that used mongo v4, which is 1.8.4.

This got all of the pods running, However, there is now some fuss from the trap pods that appears to be a similar open issue from a year ago

Still waiting for redis://snmp-redis-headless:6379/0 #629

When I run microk8s kubectl logs snmp-splunk-connect-for-snmp-trap-86d79cf9c5-jwb6j -n sc4snmp I see

Still waiting for redis://snmp-redis-headless:6379/0 (3240s elapsed) Still waiting for redis://snmp-redis-headless:6379/0 (3260s elapsed) Still waiting for redis://snmp-redis-headless:6379/0 (3280s elapsed) Still waiting for redis://snmp-redis-headless:6379/0 (3300s elapsed) Still waiting for redis://snmp-redis-headless:6379/0 (3320s elapsed)

That's for both traps pods.

Curling my Splunk instance with the assigned HEC token generates an event. Sending a test SNMP trap generates an event in TCPDump on the sc4snmp host. But the trap event does not get sent to Splunk Cloud nor do I see the port 443 traffic that would suggest it's being sent.

I'm thinking that trap pod isn't running and won't receive trap events.

Here is microk8s kubectl describe service snmp-redis-headless -n sc4snmp

Name: snmp-redis-headless Namespace: sc4snmp Labels: app.kubernetes.io/instance=snmp app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=redis helm.sh/chart=redis-16.8.10 Annotations: meta.helm.sh/release-name: snmp meta.helm.sh/release-namespace: sc4snmp Selector: app.kubernetes.io/instance=snmp,app.kubernetes.io/name=redis Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: None IPs: None Port: tcp-redis 6379/TCP TargetPort: redis/TCP Endpoints: 10.1.89.26:6379 Session Affinity: None Events:

thasteve commented 10 months ago

I wanted to add that I have been able to get the pods running effectively with the latest and greatest code. VI team was able to pass the AVX instruction set to my SNMP server. However, after reinstalling the above issue still exists with the "Still waiting for redis-headless" log events from traps. And SNMP traps are not getting sent to Splunkcloud. i believe as a result of the "Still waiting..." issue.

ajasnosz commented 10 months ago

The "Still waiting ..." most of the times is caused by the kubernetes dns issues. It was similar with the issue you referenced above.

  1. You can check if the addon dns is enabled. Run microk8s status and check for the dns section is enabled.
  2. Check if the coredns pod is up microk8s kubectl get pods -A
  3. Check the logs from coredns pod microk8s kubectl logs pod/coredns-<id> -n kube-system > coredns.log microk8s kubectl logs describe pod/coredns-<id> -n kube-system > coredns_describe.log