kubereboot / kured

Kubernetes Reboot Daemon
https://kured.dev
Apache License 2.0
2.17k stars 202 forks source link

'Permission denied' when using signal reboot mechanism to reboot AKS nodes #898

Closed andreas-wirth closed 7 months ago

andreas-wirth commented 7 months ago

Hi all, I tried the new signal reboot method introduced with kured 1.15.0. However something is off with my configuration and the process does not have the permission to reboot, causing the fatal error Signal of SIGRTMIN+5 failed: permission denied. (I also tried to reboot from the pod with kill -s 39 1, same result) This is my daemonset configuration:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured
spec:
  selector:
    matchLabels:
      name: kured
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: kured
    spec:
      serviceAccountName: kured
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      hostPID: true
      restartPolicy: Always
      volumes:
        - name: sentinel
          hostPath:
            path: /var/run
            type: Directory
        - name: secrets-store
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: kured-kv-integration
      containers:
        - name: kured
          image: ghcr.io/kubereboot/kured:1.15.0
          imagePullPolicy: Always
          securityContext:
            privileged: false
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["*"]
              add: ["CAP_KILL"]
          volumeMounts:
            - name: secrets-store
              mountPath: "/mnt/config/secrets"
              readOnly: true
            - mountPath: /sentinel
              name: sentinel
              readOnly: true
          env:
            - name: KURED_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: TEAMS_WEBHOOK_SHOUTRRR
              valueFrom:
                secretKeyRef:
                  name: kured-env-secrets
                  key: TEAMS_WEBHOOK_SHOUTRRR
          command:
            - /usr/bin/kured
            - --reboot-sentinel=/sentinel/reboot-required
            - --reboot-method=signal
            - --reboot-days=mon,tue,wed,thu,fri
            - --start-time=9:00
            - --end-time=16:00:00
            - --time-zone=Europe/Berlin
            - --notify-url="$(TEAMS_WEBHOOK_SHOUTRRR)"

I had a look at #814 where the feature was introduced and also checked the capabilities of my kured process, which seem to be fine:

/ # grep ^Cap /proc/1554253/status
CapInh: 0000000000000000
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

I am running on AKS 1.27.3 with Ubuntu 2204 node images.

I hope someone has an idea whats wrong, otherwise I will be stuck with the traditional reboot approach :(

Thanks in advance!

ckotzbauer commented 7 months ago

Thanks for opening this issue. Do you have an active apparmor installation on your host?

andreas-wirth commented 7 months ago

Thanks for the hint, apparmor is actually set to enabled as default on aks, I was not aware. Also seems like MS does not really have a good approach to easily initialise apparmor profiles on node scale-up. Here they suggest using a daemonset, I guess something like this implementation.

How are you dealing with this issue? Or are you just not using apparmor? Not sure how other cloud providers handle apparmor configuration, but a short hint in the kured docs about it might be useful.

ckotzbauer commented 7 months ago

I made it work in a test-environment with a pod-annotation for kured:

container.apparmor.security.beta.kubernetes.io/kured: unconfined
andreas-wirth commented 7 months ago

That did the trick, thanks. Also do not forget to put this annotation in the right place (inside the template section). And not in the DaemonSet annotations, as I did 😄