Open Mattes83 opened 5 months ago
NodeGetInfo returns the upper bound of possible volumes that can be attached, but limits are checked for before each volume creation to prevent the situation you're talking about.
If possible, provide your cluster template as well as your helm values to see whether it's a misconfiguration. I might've missed something that we configure internally for the open source helm chart.
Describe the solution you'd like: It is the csi drivers task to inform the scheduler about the maximum number of volumes that can be attached to a node. Currently this seems to be a static value. It works as long as the nodes have only one network interface. In CAPIC you can specify additional networks (https://github.com/ionos-cloud/cluster-api-provider-ionoscloud/blob/main/api/v1alpha1/ionoscloudmachine_types.go#L156). In our case we are having multiple network interfaces. This leads to problems because pods are scheduled to nodes where no more volumes can be attached.
To fix this you could either take into account the number of nics attach when responding to NodeGetInfo.max_volumes_per_node or simply make this value configurable.
Anything else you would like to add: This is a flaw in the underlying csi driver. As the source code of the csi driver is not public yet I am posting this here. It would be great if you can forward this to the appropriate people.
additional context: https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo