What steps did you take and what happened:
We're testing OpenEBS LVM Local PV provider as a local storage solution for specific workloads in our Openshift environment.
We've added a VMDK to multiple nodes to be used for logical volumes and created the same VG on each node.
The provider is deployed using Helm chart with custom image registry (due to necessity to use OpenEBS 3.10 since we're using an outdated Openshift deployment in a restricted environment)
StorageClass used for PVC (hostnames are redacted, but use only the nodes that are used for local storage):
When we create a number of PVCs by scaling the STS, it takes a while for PVs to be provisioned. While a subset of them are created and succefully bounded to PVC almost immediately, the other PVCs take more time than expected, and the time to provision PVs increases with amount of nodes needed to accomadate all the requests.
According to our tests, it can take up to 5 minutes to provision the storage to satisfy the requests depending on the number of nodes needed (we've test with up to 8)
This behaviour also spreads to newly created storage requests as it throws the same "ResourceExhausted" error in logs until it gets to a node with free space in a volume group.
My main question is - is this behaviour expected? It seems like the provider controller doesn't aggregate the information about space left in volume groups of the provisioning nodes or does not respect LVMNode custom resource information at all, and just goes sequentially to the nodes in StorageClass list and tries to provision a PV on each until it reaches some kind of timeout and moves to another node.
What did you expect to happen:
All persistent volumes are provisioned and bounded to corresponding claims all the same time/reasonable amount of time.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other Pastebin is fine.)
I don't believe there is any relavent info in the logs except for errors like:
E1107 13:50:36.474552 1 grpc.go:79] GRPC error: rpc error: code = ResourceExhausted desc = no vg available to serve volume request having regex="^ebsvg$" & capacity="4294967296"
Anything else you would like to add:
If any other logs are needed, I can provided it at request.
Server Version: 4.10.67
Kubernetes Version: v1.23.17+26fdcdf
Cloud provider or hardware configuration: we're using VMWare VCenter 7.0.3.01700 cluster as virtualization provider for the Openshift cluster and virtualized storage
OS (e.g. from /etc/os-release): Red Hat CoreOS 4.10
What steps did you take and what happened: We're testing OpenEBS LVM Local PV provider as a local storage solution for specific workloads in our Openshift environment. We've added a VMDK to multiple nodes to be used for logical volumes and created the same VG on each node. The provider is deployed using Helm chart with custom image registry (due to necessity to use OpenEBS 3.10 since we're using an outdated Openshift deployment in a restricted environment)
StorageClass used for PVC (hostnames are redacted, but use only the nodes that are used for local storage):
Here's an excerpt with PVC template from a statefulset we're using for testing:
When we create a number of PVCs by scaling the STS, it takes a while for PVs to be provisioned. While a subset of them are created and succefully bounded to PVC almost immediately, the other PVCs take more time than expected, and the time to provision PVs increases with amount of nodes needed to accomadate all the requests. According to our tests, it can take up to 5 minutes to provision the storage to satisfy the requests depending on the number of nodes needed (we've test with up to 8) This behaviour also spreads to newly created storage requests as it throws the same "ResourceExhausted" error in logs until it gets to a node with free space in a volume group. My main question is - is this behaviour expected? It seems like the provider controller doesn't aggregate the information about space left in volume groups of the provisioning nodes or does not respect LVMNode custom resource information at all, and just goes sequentially to the nodes in StorageClass list and tries to provision a PV on each until it reaches some kind of timeout and moves to another node.
What did you expect to happen: All persistent volumes are provisioned and bounded to corresponding claims all the same time/reasonable amount of time.
The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)
I don't believe there is any relavent info in the logs except for errors like:
Anything else you would like to add: If any other logs are needed, I can provided it at request.
Environment:
/etc/os-release
): Red Hat CoreOS 4.10