ibm-cloud-docs / monitoring

Documentation repository for monitoring
1 stars 11 forks source link

CrashLoopBackoff issue due to image mismatch #52

Open testrashmi opened 9 months ago

testrashmi commented 9 months ago

Reference SNOW tickets: CS3782797 CS3740897 CS3703596 CS3703753

Facing error: Issue with Red Hat OpenShift agent or Kubernetes agent - Sysdig agent pods get into CrashLoopBackOff Sysdig agent pods restart number of times

What’s happening After the Monitoring agent is deployed on your cluster, agent fails with CrashLoopBackoff

View the logs of the failed pod .It will contain "Failed to init system inspector. Exception message: Failed to load scap driver" kubectl logs -n sysdig-agent-namespace sysdig-agent-name kubectl logs -n ibm-observe sysdig-agent-name

Why it’s happening The pods are crashing intermittently because the versions of agent-slim and agent-kmod containers might not match. To circumvent this issue and ensure precise control over agent versions, we strongly recommend deploying the agent using Helm and explicitly specifying the desired version for each component. ref: https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent-deploy-kube-helm https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent-deploy-openshift-helm

This is to safeguard against future version mismatches and guarantee system stability.

To resolve this issue, run the following commands : The workaround here to resolve mismatched version between the agent-kmodule and agent-slim is to-

Delete/restart the Sysdig agent pod is the resolution

To avoid this issue is future you can redeploy the Sysdig agent using Helm method which uses fixed agent-kmodule/agent-slim or if script deployment is preferred, add the option [-av | --agent-version ] to the script.

For example:

curl -sL https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/IBMCloud-Kubernetes-Service/install-agent-k8s.sh | bash -s -- -a ACCESS_KEY -c COLLECTOR_ENDPOINT -t TAG_DATA -ac 'sysdig_capture_enabled: false' -av 12.18.1

More details: https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent_Kube https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent_openshift

*** Add these details under : https://cloud.ibm.com/docs/monitoring?topic=monitoring-troubleshoot

testrashmi commented 9 months ago

Command has to be has to be updated in both places

example: curl -sL https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/IBMCloud-Kubernetes-Service/install-agent-k8s.sh | bash -s -- -a ACCESS_KEY -c COLLECTOR_ENDPOINT -t TAG_DATA -ac 'sysdig_capture_enabled: false' -av 12.18.1

https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent_Kube

https://cloud.ibm.com/docs/monitoring?topic=monitoring-agent_openshift

vanadiscrawford commented 9 months ago

Issue https://github.ibm.com/Observability/docs/issues/503 has been opened to track as part of our workstream