openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
263 stars 99 forks source link

Slow PersistenVolume creation for multiple created PVC (e.g. through STS template) when using multiple local storage nodes #347

Open polskomleko opened 2 weeks ago

polskomleko commented 2 weeks ago

What steps did you take and what happened: We're testing OpenEBS LVM Local PV provider as a local storage solution for specific workloads in our Openshift environment. We've added a VMDK to multiple nodes to be used for logical volumes and created the same VG on each node. The provider is deployed using Helm chart with custom image registry (due to necessity to use OpenEBS 3.10 since we're using an outdated Openshift deployment in a restricted environment)

StorageClass used for PVC (hostnames are redacted, but use only the nodes that are used for local storage):

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - worker01.example.com
    - worker02.example.com
    - worker03.example.com
    - worker04.example.com
    - worker10.example.com
    - worker11.example.com
    - worker12.example.com
    - worker13.example.com
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvmpv
parameters:
  storage: lvm
  volgroup: ebsvg
provisioner: local.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

Here's an excerpt with PVC template from a statefulset we're using for testing:

  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: pv-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 4Gi
      storageClassName: openebs-lvmpv
      volumeMode: Filesystem

When we create a number of PVCs by scaling the STS, it takes a while for PVs to be provisioned. While a subset of them are created and succefully bounded to PVC almost immediately, the other PVCs take more time than expected, and the time to provision PVs increases with amount of nodes needed to accomadate all the requests. According to our tests, it can take up to 5 minutes to provision the storage to satisfy the requests depending on the number of nodes needed (we've test with up to 8) This behaviour also spreads to newly created storage requests as it throws the same "ResourceExhausted" error in logs until it gets to a node with free space in a volume group. My main question is - is this behaviour expected? It seems like the provider controller doesn't aggregate the information about space left in volume groups of the provisioning nodes or does not respect LVMNode custom resource information at all, and just goes sequentially to the nodes in StorageClass list and tries to provision a PV on each until it reaches some kind of timeout and moves to another node.

What did you expect to happen: All persistent volumes are provisioned and bounded to corresponding claims all the same time/reasonable amount of time.

The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)

I don't believe there is any relavent info in the logs except for errors like:

E1107 13:50:36.474552       1 grpc.go:79] GRPC error: rpc error: code = ResourceExhausted desc = no vg available to serve volume request having regex="^ebsvg$" & capacity="4294967296"

Anything else you would like to add: If any other logs are needed, I can provided it at request.

Environment:

abhilashshetty04 commented 22 hours ago

Hi @polskomleko , Thanks for reporting the issue. We will reproduce this and let you know our findings.