piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
400 stars 63 forks source link

Provisioning fails after removing node labels due to missing topology keys #330

Open blampe opened 2 years ago

blampe commented 2 years ago

After removing some labels on a few nodes (ironically from the previous CSI I replaced with Piraeus) I was no longer able to provision volumes. The error was:

failed to provision volume with StorageClass "ssd": error generating accessibility requirements: topologyKeys [beta.kubernetes.io/arch beta.kubernetes.io/instance-type beta.kubernetes.io/os ... kubernetes.io/arch kubernetes.io/hostname kubernetes.io/os linbit.com/hostname linbit.com/sp-DfltDisklessStorPool node.kubernetes.io/instance-type registered-by topology.cstor.openebs.io/nodeName] were not found on any nodes

After restarting the CSI node pods I was able to provision, although https://github.com/piraeusdatastore/piraeus-operator/pull/243#discussion_r761051923 seems to suggest this shouldn't be needed.

Also surprising was the list of topology keys included every node label. Not all of these are appropriate for determining data placement, so it would be helpful to specify the labels that should be used for topology determinations.

WanzenBug commented 2 years ago

Looks like a bug (we had a few similar reports internally), but I could never reproduce the situation.

Also surprising was the list of topology keys included every node label.

This is for the most part because it is the simplest solution that works for any cluster without requiring the user to customize. LINSTOR/Piraeus does not really impose any kind of topology, so we just inherit whatever is configured on the cluster. But different users use different labels for that purpose. But CSI expects the storage system to export it's own topology. So the solution we used is to just "copy" all labels from the node, so every user can just re-use what's already configured.