metal-stack / csi-driver-lvm

MIT License
120 stars 25 forks source link

PVCs pending with WaitForFirstConsumer on fresh install #66

Closed jtackaberry closed 2 years ago

jtackaberry commented 2 years ago

Not sure if this is a bug report or a support request, but in any case I can't spot what's going awry.

Fresh install of microk8s 1.23 and csi-driver-lvm v0.4.1 via the Helm chart at https://github.com/metal-stack/helm-charts/tree/master/charts/csi-driver-lvm (which supports StorageClass under storage.k8s.io/v1 ).

# Deploy CSI driver
$ cat values.yaml
lvm:
  devicePattern: /dev/sdb
rbac:
  pspEnabled: false
$ helm upgrade --install --create-namespace -n storage -f values.yaml csi-driver-lvm ./helm-charts/charts/csi-driver-lvm/

# Storage classes created
$ kubectl get storageclass
NAME                              PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
csi-driver-lvm-striped            lvm.csi.metal-stack.io   Delete          WaitForFirstConsumer   true                   27m
csi-driver-lvm-mirror             lvm.csi.metal-stack.io   Delete          WaitForFirstConsumer   true                   27m
csi-driver-lvm-linear (default)   lvm.csi.metal-stack.io   Delete          WaitForFirstConsumer   true                   27m

# Create a test PVC
$ cat pvc-test.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
  namespace: default
spec:
  storageClassName: csi-driver-lvm-linear
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: "2Gi"

$ kubectl apply -f pvc-test.yaml
$ kubectl describe -n default pvc/test
Name:          test
Namespace:     default
StorageClass:  csi-driver-lvm-linear
Status:        Pending
Volume:
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                Age               From                         Message
  ----    ------                ----              ----                         -------
  Normal  WaitForFirstConsumer  4s (x4 over 42s)  persistentvolume-controller  waiting for first consumer to be created before binding

The first sign of trouble comes from the plugin pod, where it raises a couple errors:

$ kubectl -n storage logs csi-driver-lvm-plugin-9bqb4 -c csi-driver-lvm-plugin
2022/02/05 20:02:01 unable to configure logging to stdout:no such flag -logtostderr
I0205 20:02:01.834133       1 lvm.go:108] pullpolicy: IfNotPresent
I0205 20:02:01.834139       1 lvm.go:112] Driver: lvm.csi.metal-stack.io
I0205 20:02:01.834142       1 lvm.go:113] Version: dev
I0205 20:02:01.873219       1 lvm.go:411] unable to list existing volumegroups:exit status 5
I0205 20:02:01.873250       1 nodeserver.go:51] volumegroup: csi-lvm not found
I0205 20:02:02.119070       1 nodeserver.go:58] unable to activate logical volumes:  Volume group "csi-lvm" not found
  Cannot process volume group csi-lvm
 exit status 5
I0205 20:02:02.120111       1 controllerserver.go:259] Enabling controller service capability: CREATE_DELETE_VOLUME
I0205 20:02:02.120295       1 server.go:95] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}

Over on the k8s node, /dev/sdb does exist per lvm.devicePattern:

$ blockdev --getsize64 /dev/sdb
32212254720

While the documentation doesn't say this is necessary, I didn't see any indication from the code that pvcreate is called. So I figured perhaps that was the problem, and explicitly created it (which also demonstrates that the LVM command line tools are functional on the host):

# On k8s host
$ pvcreate /dev/sdb
  Physical volume "/dev/sdb" successfully created.

# On client
$ kubectl -n storage rollout restart ds/csi-driver-lvm-plugin

No change. Still the Volume group "csi-lvm" not found" errors from the plugin pod logs. Ok, this ostensibly shouldn't be necessary, but let's create it manually:

# On k8s host
$ vgcreate csi-lvm /dev/sdb
  Volume group "csi-lvm" successfully created
$ vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  csi-lvm   1   0   0 wz--n- <30.00g <30.00g

# On client
$ kubectl -n storage rollout restart ds/csi-driver-lvm-plugin

This has addressed the errors from the plugin logs:

INFO: defaulting to container "csi-driver-lvm-plugin" (has: node-driver-registrar, csi-driver-lvm-plugin, liveness-probe)
2022/02/05 20:23:53 unable to configure logging to stdout:no such flag -logtostderr
I0205 20:23:53.656589       1 lvm.go:108] pullpolicy: IfNotPresent
I0205 20:23:53.656596       1 lvm.go:112] Driver: lvm.csi.metal-stack.io
I0205 20:23:53.656598       1 lvm.go:113] Version: dev
I0205 20:23:53.738596       1 controllerserver.go:259] Enabling controller service capability: CREATE_DELETE_VOLUME
I0205 20:23:53.738891       1 server.go:95] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}

But that didn't fix the pending PVC, even after recreating it:

$ kubectl describe -n default pvc/test
Name:          test
Namespace:     default
StorageClass:  csi-driver-lvm-linear
Status:        Pending
Volume:
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                Age               From                         Message
  ----    ------                ----              ----                         -------
  Normal  WaitForFirstConsumer  4s (x2 over 16s)  persistentvolume-controller  waiting for first consumer to be created before binding

Hopefully it's clear where things have gone wrong. :)

Thanks!

majst01 commented 2 years ago

Hi, from the first view, all was done right. PVCs must not created before, you can simply create a vg from a given block device or a list of block devices.

I guess your pod will mount the pvc when you delete it.

What OS is you worker node running ?

jtackaberry commented 2 years ago

I guess your pod will mount the pvc when you delete it.

This is actually the revelation, and what's missing in my reproduction steps above: the PV isn't actually provisioned until a pod mounts the PVC. I tried creating the pod while the PVC is Pending, and things are working: the VG is created, the PV is provisioned and bound, and the pod starts.

I never got as far as creating a pod, because I figured what was the point if the PVC was stuck in pending state? Every other CSI driver I have experience with so far immediately provisions a PV and binds it when a PVC is created, so I'm embarrassed to say I never bothered creating a pod, because I was expecting csi-driver-lvm to work this way as well.

Can I humbly suggest this as an improvement? IMO it's surprising behavior to defer PV creation until after some pod mounts the PVC.

What OS is you worker node running ?

Apologies for not mentioning. Ubuntu 20.04.3.

majst01 commented 2 years ago

No it cannot create the pv unless the pod is created, because this csi driver is a local-storage provider and therefor it is required to know on which node the pod get scheduled.

jtackaberry commented 2 years ago

No it cannot create the pv unless the pod is created, because this csi driver is a local-storage provider and therefor it is required to know on which node the pod get scheduled.

Hah. You're completely right of course, I have no explanation for my momentary demonstration of stupidity. :)

Perhaps a quick note in the README might be helpful for the absentminded like me to remind us that local-storage providers will work differently than network-storage providers in this regard?

Thanks for your patience @majst01. Will close as this isn't a bug and I'm up and running.

majst01 commented 2 years ago

No Problem.