Mounting issues with encrypted volumes

teknologista commented 3 years ago

Describe the bug Volumes gets in an unmountable state after trying to restart a pod using an encrypted PV

To Reproduce Setup Scaleway CSI and create encrypted storageClass as outlined in the docs. Deploy a StatefulSet such as a 3 replicaset mongodb Wait for the workload to come up PVs are provisioned and everything is fine Kill one pod and wait for it to be recreated by Kubernetes Just after the scheduler schedules the pod to run on a node it errors because it cannot mount the previously created and existing PV. See errors shown in kube logs below

Expected behavior PV should be attached to the new node where the new pod is scheduled and the pod should start

Details (please complete the following information):

Scaleway CSI version: 0.1.7
Platform: Rancher RKE2 v2.5.9
Orchestrator and its version: Kubernetes v1.20.11+rke2r2

Additional context

Errors shown

Warning FailedMount MountVolume.MountDevice failed for volume "pvc-3030ae10-3579-494a-a215-0017aea58332" : rpc error: code = Internal desc = error encrypting/opening volume with ID aeffa5d1-d5c3-406c-a728-d5d2c856aed9: luksStatus returned ok, but device scw-luks-aeffa5d1-d5c3-406c-a728-d5d2c856aed9 is not active

and

MountVolume.WaitForAttach failed for volume "pvc-83cf34a9-d36d-46e5-bbf2-199c426f518c" : volume fr-par-2/cbe3eca8-f623-4bbe-bc76-450eceb391b2 has GET error for volume attachment csi-879b1d2e5fa7ca784f356b823505c5506b57891aa56966b59c8ebfdae3497320: volumeattachments.storage.k8s.io "csi-879b1d2e5fa7ca784f356b823505c5506b57891aa56966b59c8ebfdae3497320" is forbidden: User "system:node:node-5" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node 'node-5' and this object

Again, that only seems to happen for encrypted PVs.

teknologista commented 3 years ago

By the way I have clusters of type RKE2 hardened available to test a potential fix or help debug the issue

Sh4d1 commented 3 years ago

Couldn't reproduce with this:

allowVolumeExpansion: false # not yet supported
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: "scw-bssd-enc"
provisioner: csi.scaleway.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  encrypted: "true"
  csi.storage.k8s.io/node-stage-secret-name: "enc-secret"
  csi.storage.k8s.io/node-stage-secret-namespace: "default"
---
apiVersion: v1
kind: Secret
metadata:
  name: enc-secret
  namespace: default
type: Opaque
data:
  encryptionPassphrase: bXlhd2Vzb21lcGFzc3BocmFzZQ==
---
apiVersion: v1
kind: Service
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  ports:
    - port: 27017
      targetPort: 27017
  clusterIP: None
  selector:
    role: mongo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  selector:
    matchLabels:
      role: mongo
      environment: test
  serviceName: "mongo"
  replicas: 3
  template:
    metadata:
      labels:
        role: mongo
        environment: test
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mongo
        image: mongo
        command:
          - mongod
          - "--replSet"
          - rs0
        ports:
          - containerPort: 27017
        volumeMounts:
          - name: mongo-persistent-storage
            mountPath: /data/db
      - name: mongo-sidecar
        image: cvallance/mongo-k8s-sidecar
        env:
          - name: MONGO_SIDECAR_POD_LABELS
            value: "role=mongo,environment=test"
  volumeClaimTemplates:
    - metadata:
        name: mongo-persistent-storage
      spec:
        storageClassName: "scw-bssd-enc"
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi

Used this script:

#!/bin/bash

while true; do
    while kubectl get pods --no-headers | grep -v Running ; do
        sleep 2
    done

    kubectl delete pods mongo-$(($RANDOM % 3))
done

I let it run for some time, with no issue. Tested on Kapsule with k8s 1.20.11.

Could you get cryptsetup status /dev/mapper/scw-luks-<id> when it's stuck ?

teknologist commented 3 years ago

Hi Patrik,

Thanks for looking at this.

It may then be related to the fact that the Kubernetes cluster is RKE2 Government with hardened Pod Security Policy being enforced.

I will try again tomorrow and let you know the outcome.

teknologista commented 2 years ago

Hi @Sh4d1 ,

We are stuck with this issue again today while doing a rolling upgrade of a kubernetes cluster between two minor 1.20 versions.

This is what happended:

We drain (forced) and then cordoned a node
The workload was then launched on a new node
it is now stuck with:

MountVolume.MountDevice failed for volume "pvc-7da69745-cd8b-4e4e-b236-ebcb6c76c328" : rpc error: code = Internal desc = failed to format and mount device from ("/dev/mapper/scw-luks-cd5543ac-4300-4bb7-882a-6f19ca0149c3") to ("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-7da69745-cd8b-4e4e-b236-ebcb6c76c328/globalmount") with fstype ("ext4") and options ([]): exit status 1

As per your request this is the output:

 ~ sudo cryptsetup status /dev/mapper/scw-luks-cd5543ac-4300-4bb7-882a-6f19ca0149c3
/dev/mapper/scw-luks-cd5543ac-4300-4bb7-882a-6f19ca0149c3 is active.
  type:    n/a
  cipher:  aes-xts-plain64
  keysize: 256 bits
  key location: keyring
  device:  (null)
  sector size:  512
  offset:  32768 sectors
  size:    188710912 sectors
  mode:    read/write

On the scaleway web console I can see the volume being attached to the right node though.

On the other side, I have logged on the node and tried a full cycle of:

 - cryptsetup lucksClose the device mapper from CSI
 - cryptsetup lucksOpen the device  /dev/sda
 - fsck -fy /dev/mapper/the_mapper-device
fsck did fix a few minor errors. nothing crazy but it di modify fs.

With success.

Then I did a cryptsetup lucksClose and then the volume was successfully auto mounted by the CSI without me doing anything.

There might be a reason why sometimes an ext4 volume is messed up after disconnection from workload because of a sudden kill of the pod using it. It then might need to have an fsck run to be mounted again by CSI.

I don't know if this helps, I may be wrong but it is the result of my research.

Anyway, is there anything we can do about this as it kills the auto-healing behaviour of a Kubernetes cluster (maybe run an automated fsck -fy prior to mount in pod)... :-(

Many thanks.

scaleway / scaleway-csi

Mounting issues with encrypted volumes #41