Closed marblerun closed 2 weeks ago
Hi @marblerun, I remember you had mayastor also in cluster. Does WaitForFirstConsumer work with mayastor provisioner?
Hi Abhilash ,
We are a little delayed in our Mayastor work, so currently I can't say. But I hope to get round to it soon.
@marblerun
If you still face this issue could you please provide the output of vgs
command from your node hosts, and also the pvc and storage class spec? The binding mode WaitForFirstConsumer
works normally without any issues. In your case I see errors not enough space
, so would be trying to look at the configuration issues.
I experienced a similar issue today when setting WaitForFirstConsumer
on the StorageClass:
$ k describe pod/opensearch-cluster-0
[...]
Volumes:
data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: opensearch-data-volume-0
ReadOnly: false
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m2s default-scheduler 0/4 nodes are available: 1 node(s) did not have enough free storage, 1 node(s) had untolerated taint {k3s-controlplane: true}, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
The PVC is for 20Gi.
The 1 node(s) did not have enough free storage
message that comes from the "default-scheduler" is not right as the node had plenty of space left:
worker01:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
openebsvg 1 1 0 wz--n- <80.00g <60.00g
As soon as I changed the binding mode to Immediate
the volume was created:
# kubectl
$ k get lvmvolumes -A
NAMESPACE NAME VOLGROUP NODE SIZE STATUS AGE
openebs pvc-fa694269-6310-47ae-b85d-affe29e1827f openebsvg worker01 21474836480 Ready 7m54s
# worker bash
worker01:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
openebsvg 1 2 0 wz--n- <80.00g <40.00g
worker01:~$ sudo lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
datadir-cockroachdb openebsvg -wi-ao---- 20.00g
pvc-fa694269-6310-47ae-b85d-affe29e1827f openebsvg -wi-a----- 20.00g
PVC yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
finalizers:
- kubernetes.io/pvc-protection
labels:
pvcFor: opensearch
name: opensearch-data-volume-0
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: openebs-lvm-worker01
volumeMode: Filesystem
volumeName: pvc-fa694269-6310-47ae-b85d-affe29e1827f
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
phase: Bound
SC yaml:
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
- key: kubernetes.io/hostname
values:
- worker01
- key: node-role.kubernetes.io/worker
values:
- worker
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2024-06-15T22:38:35Z"
name: openebs-lvm-worker01
resourceVersion: "700293"
uid: 76cc72fc-7609-476c-bd41-d1f09c1a49ee
parameters:
fsType: xfs
storage: lvm
volgroup: openebsvg
provisioner: local.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
BTW - This testing is done using FCOS 40:
$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="40.20240519.3.0 (CoreOS)"
ID=fedora
VERSION_ID=40
PLATFORM_ID="platform:f40"
PRETTY_NAME="Fedora CoreOS 40.20240519.3.0"
SUPPORT_END=2025-05-13
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='40.20240519.3.0'
It looks like the previous 1 node(s) did not have enough free storage
when using "WaitForFirstConsumer" was because SELinux was blocking the access to the CSI socket. With that temporarily out of the way, trying "WaitForFirstConsumer" again gets this:
# kubectl
$ k describe pvc -n monitoring opensearch-data-volume-0
Name: opensearch-data-volume-0
StorageClass: openebs-lvm-worker01
Status: Pending
[...]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: opensearch-cluster-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 9m53s (x2 over 9m53s) persistentvolume-controller waiting for first consumer to be created before binding
Normal ExternalProvisioning 4m23s (x24 over 9m48s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'local.csi.openebs.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal Provisioning 77s (x11 over 9m48s) local.csi.openebs.io_openebs-lvm-localpv-controller-0_f6bc82bf-5b1f-4318-841c-1653544c7f49 External provisioner is provisioning volume for claim "monitoring/opensearch-data-volume-0"
Warning ProvisioningFailed 77s (x11 over 9m48s) local.csi.openebs.io_openebs-lvm-localpv-controller-0_f6bc82bf-5b1f-4318-841c-1653544c7f49 failed to provision volume with StorageClass "openebs-lvm-worker01": error generating accessibility requirements: selected node '"worker01"' topology 'map[kubernetes.io/hostname:worker01 openebs.io/nodename:worker01]' is not in allowed topologies: [map[kubernetes.io/hostname:worker01 node-role.kubernetes.io/worker:worker]]
The following line is interesting: "topology 'map[kubernetes.io/hostname:worker01 openebs.io/nodename:worker01]' is not in allowed topologies"
Hi @marblerun ,
1 node(s) did not have enough free storage
Can you share us the csi provisioner log for this instance. It would be interesting to see what happened there.
kubectl logs -f openebs-lvm-localpv-controller-xxxxx-xxxx -n openebs -c openebs-lvm-plugin
Are you using the same sc manifest that you shared below for WaitForFirstConsumer
? If yes, then why have you used key: node-role.kubernetes.io/worker
?
allowVolumeExpansion: true allowedTopologies: - matchLabelExpressions: - key: kubernetes.io/hostname values: - worker01 - key: node-role.kubernetes.io/worker values: - worker
Hi @marblerun ,
1 node(s) did not have enough free storage
Can you share us the csi provisioner log for this instance. It would be interesting to see what happened there.
kubectl logs -f openebs-lvm-localpv-controller-xxxxx-xxxx -n openebs -c openebs-lvm-plugin
Are you using the same sc manifest that you shared below for
WaitForFirstConsumer
? If yes, then why have you usedkey: node-role.kubernetes.io/worker
?allowVolumeExpansion: true allowedTopologies: - matchLabelExpressions: - key: kubernetes.io/hostname values: - worker01 - key: node-role.kubernetes.io/worker values: - worker
FYI - Your quoting me, not the OP.
I just tried replicating this and I couldn't. Unfortunately I would have to re-create the nodes in order to replicate it and I don't time until next weekend :(
This is what SELinux reported at the time:
type=AVC msg=audit(1718432537.208:215475): avc: denied { connectto } for pid=2115 comm="csi-node-driver" path="/plugin/csi.sock" scontext=system_u:system_r:container_t:s0:c56,c810 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
Once I created a policy for that I ended up as described above
I use key: node-role.kubernetes.io/worker
to avoid any chance of workloads/volumes landing on nodes that perform different roles.
Is this still an issue?
Closing now. Feel free to re-open if the issue occurs again.
What steps did you take and what happened: As part of an exercise to test online volume expansion. the current version of lvm-localpv was installed using helm.
The worked example in the Readme file was then followed, but failed.
After working through the issue with support online, the storageclass definition had the line
volumeBindingMode: WaitForFirstConsumer
removed, and then worked as expected.
This parameter was carried over from a previous storageclass definition, where we had used openebs localpv to create storage requests for mongob cluster.
What did you expect to happen:
For the lvm to be created.
The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)
From a failing pod, before the storageclass change
root@kube-1:~# kubectl -n openebs describe pod fio Name: fio Namespace: openebs Priority: 0 Service Account: default Node:
Labels:
Annotations:
Status: Pending
IP:
IPs:
Containers:
perfrunner:
Image: openebs/tests-fio
Port:
Host Port:
Command:
/bin/bash
Args:
-c
while true ;do sleep 50; done
Environment:
Mounts:
/mnt/datadir from fio-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-95h98 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
fio-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: csi-lvmpv
ReadOnly: false
kube-api-access-95h98:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Warning FailedScheduling 93s default-scheduler 0/4 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 3 node(s) did not have enough free storage. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
At this stage, all 4 nodes had an used 20Gb volume group defined and ready
kubectl logs -f openebs-lvm-controller-0 -n kube-system -c openebs-lvm-plugin
kubectl logs -f openebs-lvm-node-[xxxx] -n kube-system -c openebs-lvm-plugin
kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-75748cc9fd-9bv7n 1/1 Running 0 84d calico-node-5gz47 1/1 Running 1 (84d ago) 177d calico-node-ggcfz 1/1 Running 1 (84d ago) 177d calico-node-ljgvl 1/1 Running 1 (84d ago) 120d calico-node-s8pk6 1/1 Running 2 (84d ago) 177d coredns-588bb58b94-45drn 1/1 Running 0 84d coredns-588bb58b94-fkr4t 1/1 Running 0 84d dns-autoscaler-5b9959d7fc-9dmlp 1/1 Running 0 84d kube-apiserver-kube-1 1/1 Running 2 (84d ago) 177d kube-apiserver-kube-2 1/1 Running 2 (84d ago) 177d kube-apiserver-kube-3 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-1 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-2 1/1 Running 2 (84d ago) 177d kube-controller-manager-kube-3 1/1 Running 3 (84d ago) 177d kube-proxy-8wz2g 1/1 Running 1 (84d ago) 120d kube-proxy-klt5x 1/1 Running 1 (84d ago) 120d kube-proxy-qm52f 1/1 Running 1 (84d ago) 120d kube-proxy-s7c42 1/1 Running 1 (84d ago) 120d kube-scheduler-kube-1 1/1 Running 2 (84d ago) 177d kube-scheduler-kube-2 1/1 Running 3 (84d ago) 177d kube-scheduler-kube-3 1/1 Running 2 (84d ago) 177d kubernetes-dashboard-74cc7bdb6d-52n49 1/1 Running 0 84d kubernetes-metrics-scraper-75666d949b-fmxdp 1/1 Running 0 84d local-volume-provisioner-h2gvj 1/1 Running 1 (84d ago) 177d local-volume-provisioner-h5js2 1/1 Running 1 (84d ago) 120d local-volume-provisioner-m8xwj 1/1 Running 1 (84d ago) 177d local-volume-provisioner-v7gc6 1/1 Running 1 (84d ago) 177d metrics-server-5dc9f5cf76-spdcq 1/1 Running 0 84d nginx-proxy-kube-4 1/1 Running 1 (84d ago) 120d nodelocaldns-88rcq 1/1 Running 1 (84d ago) 140d nodelocaldns-dp55h 1/1 Running 2 (49d ago) 140d nodelocaldns-n67jz 1/1 Running 1 (84d ago) 140d nodelocaldns-rql67 1/1 Running 0 84d
kubectl get pods -n openebs NAME READY STATUS RESTARTS AGE fio 1/1 Running 0 92s openebs-localpv-provisioner-68cb6c95f5-xvtpj 1/1 Running 0 28h openebs-lvmlocalpv-lvm-localpv-controller-0 5/5 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-2tpph 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-lxmmq 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-m5mfr 2/2 Running 0 28h openebs-lvmlocalpv-lvm-localpv-node-prbd7 2/2 Running 0 28h openebs-ndm-7b994 1/1 Running 0 28h openebs-ndm-9xz7r 1/1 Running 0 28h openebs-ndm-bzl6w 1/1 Running 0 28h openebs-ndm-nbg6f 1/1 Running 0 28h openebs-ndm-operator-54478658f7-btwmj 1/1 Running 0 28h
working version
root@kube-1:~# kubectl get lvmvol -nopenebs -o yaml apiVersion: v1 items:
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
LVM Driver version root@kube-1:~# lvm version LVM version: 2.03.16(2) (2022-05-18) Library version: 1.02.185 (2022-05-18) Driver version: 4.47.0
Kubernetes version 1.25.5
Kubernetes installer & version: Kubespray 1:20
Cloud provider or hardware configuration: Hetzner cloud instance
OS (e.g. from
/etc/os-release
): Debian 12 bookworm