example lvm deployment fails with storage class definition "volumeBindingMode: WaitForFirstConsumer" set

marblerun commented 12 months ago

What steps did you take and what happened: As part of an exercise to test online volume expansion. the current version of lvm-localpv was installed using helm.

The worked example in the Readme file was then followed, but failed.

After working through the issue with support online, the storageclass definition had the line

volumeBindingMode: WaitForFirstConsumer

removed, and then worked as expected.

This parameter was carried over from a previous storageclass definition, where we had used openebs localpv to create storage requests for mongob cluster.

What did you expect to happen:

For the lvm to be created.

The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)

From a failing pod, before the storageclass change

root@kube-1:~# kubectl -n openebs describe pod fio Name: fio Namespace: openebs Priority: 0 Service Account: default Node: Labels: Annotations: Status: Pending IP: IPs: Containers: perfrunner: Image: openebs/tests-fio Port: Host Port: Command: /bin/bash Args: -c while true ;do sleep 50; done Environment: Mounts: /mnt/datadir from fio-vol (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-95h98 (ro) Conditions: Type Status PodScheduled False Volumes: fio-vol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: csi-lvmpv ReadOnly: false kube-api-access-95h98: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Warning FailedScheduling 93s default-scheduler 0/4 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 3 node(s) did not have enough free storage. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

At this stage, all 4 nodes had an used 20Gb volume group defined and ready

kubectl logs -f openebs-lvm-controller-0 -n kube-system -c openebs-lvm-plugin
kubectl logs -f openebs-lvm-node-[xxxx] -n kube-system -c openebs-lvm-plugin

kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-75748cc9fd-9bv7n 1/1 Running 0 84d calico-node-5gz47 1/1 Running 1 (84d ago) 177d calico-node-ggcfz 1/1 Running 1 (84d ago) 177d calico-node-ljgvl 1/1 Running 1 (84d ago) 120d calico-node-s8pk6 1/1 Running 2 (84d ago) 177d coredns-588bb58b94-45drn 1/1 Running 0 84d coredns-588bb58b94-fkr4t 1/1 Running 0 84d dns-autoscaler-5b9959d7fc-9dmlp 1/1 Running 0 84d kube-apiserver-kube-1 1/1 Running 2 (84d ago) 177d kube-apiserver-kube-2 1/1 Running 2 (84d ago) 177d kube-apiserver-kube-3 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-1 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-2 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-3 1/1 Running 3 (84d ago) 177d kube-proxy-8wz2g 1/1 Running 1 (84d ago) 120d kube-proxy-klt5x 1/1 Running 1 (84d ago) 120d kube-proxy-qm52f 1/1 Running 1 (84d ago) 120d kube-proxy-s7c42 1/1 Running 1 (84d ago) 120d kube-scheduler-kube-1 1/1 Running 2 (84d ago) 177d kube-scheduler-kube-2 1/1 Running 3 (84d ago) 177d kube-scheduler-kube-3 1/1 Running 2 (84d ago) 177d kubernetes-dashboard-74cc7bdb6d-52n49 1/1 Running 0 84d kubernetes-metrics-scraper-75666d949b-fmxdp 1/1 Running 0 84d local-volume-provisioner-h2gvj 1/1 Running 1 (84d ago) 177d local-volume-provisioner-h5js2 1/1 Running 1 (84d ago) 120d local-volume-provisioner-m8xwj 1/1 Running 1 (84d ago) 177d local-volume-provisioner-v7gc6 1/1 Running 1 (84d ago) 177d metrics-server-5dc9f5cf76-spdcq 1/1 Running 0 84d nginx-proxy-kube-4 1/1 Running 1 (84d ago) 120d nodelocaldns-88rcq 1/1 Running 1 (84d ago) 140d nodelocaldns-dp55h 1/1 Running 2 (49d ago) 140d nodelocaldns-n67jz 1/1 Running 1 (84d ago) 140d nodelocaldns-rql67 1/1 Running 0 84d

kubectl get pods -n openebs NAME READY STATUS RESTARTS AGE fio 1/1 Running 0 92s openebs-localpv-provisioner-68cb6c95f5-xvtpj 1/1 Running 0 28h openebs-lvmlocalpv-lvm-localpv-controller-0 5/5 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-2tpph 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-lxmmq 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-m5mfr 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-prbd7 2/2 Running 0 28h openebs-ndm-7b994 1/1 Running 0 28h openebs-ndm-9xz7r 1/1 Running 0 28h openebs-ndm-bzl6w 1/1 Running 0 28h openebs-ndm-nbg6f 1/1 Running 0 28h openebs-ndm-operator-54478658f7-btwmj 1/1 Running 0 28h

working version

root@kube-1:~# kubectl get lvmvol -nopenebs -o yaml apiVersion: v1 items:

apiVersion: local.openebs.io/v1alpha1 kind: LVMVolume metadata: creationTimestamp: "2023-10-12T16:11:48Z" finalizers:
- lvm.openebs.io/finalizer generation: 3 labels: kubernetes.io/nodename: kube-1 name: pvc-aad9bd20-32d9-4043-a426-932dc796ca57 namespace: openebs resourceVersion: "48444508" uid: 9f04d2a8-0723-4b95-931d-1bc4a280e75d spec: capacity: "1073741824" ownerNodeID: kube-1 shared: "no" thinProvision: "no" vgPattern: ^lvmvg$ volGroup: lvmvg status: state: Ready kind: List metadata: resourceVersion: ""

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

LVM Driver version root@kube-1:~# lvm version LVM version: 2.03.16(2) (2022-05-18) Library version: 1.02.185 (2022-05-18) Driver version: 4.47.0
Kubernetes version 1.25.5
Kubernetes installer & version: Kubespray 1:20
Cloud provider or hardware configuration: Hetzner cloud instance
OS (e.g. from /etc/os-release): Debian 12 bookworm

abhilashshetty04 commented 11 months ago

Hi @marblerun, I remember you had mayastor also in cluster. Does WaitForFirstConsumer work with mayastor provisioner?

marblerun commented 11 months ago

Hi Abhilash ,

We are a little delayed in our Mayastor work, so currently I can't say. But I hope to get round to it soon.

dsharma-dc commented 4 months ago

@marblerun If you still face this issue could you please provide the output of vgs command from your node hosts, and also the pvc and storage class spec? The binding mode WaitForFirstConsumer works normally without any issues. In your case I see errors not enough space, so would be trying to look at the configuration issues.

ToroNZ commented 3 months ago

I experienced a similar issue today when setting WaitForFirstConsumer on the StorageClass:

$ k describe pod/opensearch-cluster-0
[...]
Volumes:
  data-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  opensearch-data-volume-0
    ReadOnly:   false
[...]
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  3m2s  default-scheduler  0/4 nodes are available: 1 node(s) did not have enough free storage, 1 node(s) had untolerated taint {k3s-controlplane: true}, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

The PVC is for 20Gi. The 1 node(s) did not have enough free storage message that comes from the "default-scheduler" is not right as the node had plenty of space left:

worker01:~$ sudo vgs
  VG        #PV #LV #SN Attr   VSize   VFree  
  openebsvg   1   1   0 wz--n- <80.00g <60.00g

As soon as I changed the binding mode to Immediate the volume was created:

# kubectl
$ k get lvmvolumes -A
NAMESPACE   NAME                                       VOLGROUP    NODE       SIZE          STATUS   AGE
openebs     pvc-fa694269-6310-47ae-b85d-affe29e1827f   openebsvg   worker01   21474836480   Ready    7m54s

# worker bash
worker01:~$ sudo vgs
  VG        #PV #LV #SN Attr   VSize   VFree  
  openebsvg   1   2   0 wz--n- <80.00g <40.00g
worker01:~$ sudo lvs
  LV                                       VG        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  datadir-cockroachdb                      openebsvg -wi-ao---- 20.00g                                                    
  pvc-fa694269-6310-47ae-b85d-affe29e1827f openebsvg -wi-a----- 20.00g

PVC yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
    volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    pvcFor: opensearch
  name: opensearch-data-volume-0
  namespace: monitoring
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: openebs-lvm-worker01
  volumeMode: Filesystem
  volumeName: pvc-fa694269-6310-47ae-b85d-affe29e1827f
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  phase: Bound

SC yaml:

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - worker01
  - key: node-role.kubernetes.io/worker
    values:
    - worker
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2024-06-15T22:38:35Z"
  name: openebs-lvm-worker01
  resourceVersion: "700293"
  uid: 76cc72fc-7609-476c-bd41-d1f09c1a49ee
parameters:
  fsType: xfs
  storage: lvm
  volgroup: openebsvg
provisioner: local.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

ToroNZ commented 3 months ago

BTW - This testing is done using FCOS 40:

$ cat /etc/os-release 
NAME="Fedora Linux"
VERSION="40.20240519.3.0 (CoreOS)"
ID=fedora
VERSION_ID=40
PLATFORM_ID="platform:f40"
PRETTY_NAME="Fedora CoreOS 40.20240519.3.0"
SUPPORT_END=2025-05-13
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='40.20240519.3.0'

It looks like the previous 1 node(s) did not have enough free storage when using "WaitForFirstConsumer" was because SELinux was blocking the access to the CSI socket. With that temporarily out of the way, trying "WaitForFirstConsumer" again gets this:

# kubectl
$ k describe pvc -n monitoring opensearch-data-volume-0
Name:          opensearch-data-volume-0
StorageClass:  openebs-lvm-worker01
Status:        Pending
[...]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       opensearch-cluster-0
Events:
  Type     Reason                Age                     From                                                                                        Message
  ----     ------                ----                    ----                                                                                        -------
  Normal   WaitForFirstConsumer  9m53s (x2 over 9m53s)   persistentvolume-controller                                                                 waiting for first consumer to be created before binding
  Normal   ExternalProvisioning  4m23s (x24 over 9m48s)  persistentvolume-controller                                                                 Waiting for a volume to be created either by the external provisioner 'local.csi.openebs.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal   Provisioning          77s (x11 over 9m48s)    local.csi.openebs.io_openebs-lvm-localpv-controller-0_f6bc82bf-5b1f-4318-841c-1653544c7f49  External provisioner is provisioning volume for claim "monitoring/opensearch-data-volume-0"
  Warning  ProvisioningFailed    77s (x11 over 9m48s)    local.csi.openebs.io_openebs-lvm-localpv-controller-0_f6bc82bf-5b1f-4318-841c-1653544c7f49  failed to provision volume with StorageClass "openebs-lvm-worker01": error generating accessibility requirements: selected node '"worker01"' topology 'map[kubernetes.io/hostname:worker01 openebs.io/nodename:worker01]' is not in allowed topologies: [map[kubernetes.io/hostname:worker01 node-role.kubernetes.io/worker:worker]]

The following line is interesting: "topology 'map[kubernetes.io/hostname:worker01 openebs.io/nodename:worker01]' is not in allowed topologies"

abhilashshetty04 commented 3 months ago

Hi @marblerun ,

1 node(s) did not have enough free storage

Can you share us the csi provisioner log for this instance. It would be interesting to see what happened there.

kubectl logs -f openebs-lvm-localpv-controller-xxxxx-xxxx -n openebs -c openebs-lvm-plugin

Are you using the same sc manifest that you shared below for WaitForFirstConsumer? If yes, then why have you used key: node-role.kubernetes.io/worker?

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - worker01
  - key: node-role.kubernetes.io/worker
    values:
    - worker

ToroNZ commented 3 months ago

Hi @marblerun ,

1 node(s) did not have enough free storage

Can you share us the csi provisioner log for this instance. It would be interesting to see what happened there.

kubectl logs -f openebs-lvm-localpv-controller-xxxxx-xxxx -n openebs -c openebs-lvm-plugin

Are you using the same sc manifest that you shared below for WaitForFirstConsumer? If yes, then why have you used key: node-role.kubernetes.io/worker?
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - worker01
  - key: node-role.kubernetes.io/worker
    values:
    - worker

FYI - Your quoting me, not the OP.

I just tried replicating this and I couldn't. Unfortunately I would have to re-create the nodes in order to replicate it and I don't time until next weekend :(

This is what SELinux reported at the time:

type=AVC msg=audit(1718432537.208:215475): avc:  denied  { connectto } for  pid=2115 comm="csi-node-driver" path="/plugin/csi.sock" scontext=system_u:system_r:container_t:s0:c56,c810 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0

Once I created a policy for that I ended up as described above

I use key: node-role.kubernetes.io/worker to avoid any chance of workloads/volumes landing on nodes that perform different roles.

dsharma-dc commented 1 month ago

Is this still an issue?

avishnu commented 2 weeks ago

Closing now. Feel free to re-open if the issue occurs again.

openebs / lvm-localpv

example lvm deployment fails with storage class definition "volumeBindingMode: WaitForFirstConsumer" set #265