openebs / zfs-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend ZFS data storage stack.
https://openebs.io
Apache License 2.0
443 stars 106 forks source link

error generating accessibility requirements #528

Open ianb-mp opened 7 months ago

ianb-mp commented 7 months ago

What steps did you take and what happened:

After k8s node reboot, when I create a new PVC using zfs-localpv storageclass the PVC creation fails with error:

error generating accessibility requirements: topologyKeys [...] were not found on any nodes

(Full error message here)

A temporary fix is to restart the openebs-zfs-localpv-node daemonset however when I reboot the k8s node the error returns.

What did you expect to happen:

I assume this isn't expected behaviour, so it would be good if this could be resolved without requiring manual intervention.

The output of the following commands will help us better understand what's going on:

Anything else you would like to add:

zfs-localpv was installed via Openebs helm chart v4.0.0:

helm repo add openebs https://openebs.github.io/openebs
helm repo update
helm install openebs --namespace openebs openebs/openebs --create-namespace

[Miscellaneous information that will assist in solving the issue.]

Environment:

w3aman commented 7 months ago

Restarting the zfs-node daemonset is must, so that driver can pick up the required topologies if they are added after driver is installed. One way is that while installing we can set topologies so that later on for those keys we don't need to restart node-agents. see here -- https://github.com/openebs/zfs-localpv/blob/develop/docs/faq.md#6-how-to-add-custom-topology-key

For rebooting the node, it should not be the behaviour and I myself has come across rebooting the node scenario's. Volume provisioning worked fine for me. Can you please share your storage class yaml and kubectl get csinode <node-name> -o yaml and verify that same key which is used in storage class is present on the node

ianb-mp commented 7 months ago

Can you please share your storage class yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2024-04-16T22:33:59Z"
name: openebs-zfspv
resourceVersion: "1294905"
uid: e45d8212-7b24-481c-88ea-194ee5a27f21
parameters:
compression: "off"
dedup: "off"
fstype: zfs
poolname: zfspv-pool
recordsize: 128k
provisioner: zfs.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

kubectl get csinode -o yaml

See gist here

and verify that same key which is used in storage class is present on the node

Do you mean compare CSINode with node labels? Node labels are here

I restarted openebs-zfs-localpv-node daemonset then compared CSINode topologyKeys after restart with the values before restart and can see only this value has changed:

<       - scheduling.node.kubevirt.io/tsc-frequency-2200000000
---
>       - scheduling.node.kubevirt.io/tsc-frequency-2199997000
ianb-mp commented 7 months ago

I understand the issue is caused by the topology keys in CSINode not matching the node labels after a reboot. This can be fixed by updating the openebs-zfs-localpv-node daemonset and setting ALLOWED_TOPOLOGIES from all to a list of specific labels that I know won't change on reboot.

It seems odd that Kubevirt uses a node label with a dynamic key value, but I guess there must be a good reason.

I appreciate the need for the topology keys to match the node etc, but I wonder if there is a better default approach for zfs-localpv that will permit it to work with Kubevirt (and operators like it) 'out of the box' (i.e. without needing special configuration)?

sinhaashish commented 5 months ago

hi @ianb-mp As @w3aman said you can use this doc https://github.com/openebs/zfs-localpv/blob/develop/docs/faq.md#6-how-to-add-custom-topology-key to add the custom label.

As i can understand in your case the kubervirt has dynamic label which changes upon node restart. The new label need to be updated in the daemonSet directly by editing it and then restarting. The default is specified as all, which will take the node label keys as allowed topologies. This you can change it by editing the daemonset. Now in your case the node labels change upon restart, which should not have happened. And if the node label changes then you need to update the daemon set and restart it to reflect the change

avishnu commented 1 month ago

Scoping this for investigation as part of v4.3 to figure out ways to specify inclusive or exclusive list of labels in ALLOWED_TOPOLOGIES.