openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
235 stars 92 forks source link

BUG: lvmvolume is inconsistent with the actual allocated lv size #316

Open bxy4543 opened 2 weeks ago

bxy4543 commented 2 weeks ago

What steps did you take and what happened: Due to insufficient pvc resources, the pod keeps restarting. After expanding the pvc, it is found that the lv has not been successfully expanded. The remaining vg resources of the node are sufficient. Since the underlying lv has not been successfully expanded, the pod keeps restarting.

root@test-node-003:~# lvdisplay /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  --- Logical volume ---
  LV Path                /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  LV Name                pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  VG Name                lvmvg
  LV UUID                vjfkb2-ls7c-YzO7-nZYf-Ds70-3CVP-7EA5jl
  LV Write Access        read/write
  LV Creation host, time openebs-lvm-node-d8krs, 2024-06-19 14:35:24 +0800
  LV Status              available
  # open                 1
  LV Size                1.00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:151

root@test-node-003:~# kubectl get lvmvolume -n openebs pvc-bcc852f8-7266-48c4-aedc-34adb523d224 -o yaml
apiVersion: local.openebs.io/v1alpha1
kind: LVMVolume
metadata:
  creationTimestamp: "2024-06-19T06:35:24Z"
  finalizers:
  - lvm.openebs.io/finalizer
  generation: 4
  labels:
    kubernetes.io/nodename: test-node-003
  name: pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  namespace: openebs
  resourceVersion: "241412658"
  uid: 00e27140-261e-449f-a048-af94b1b312f3
spec:
  capacity: "17179869184"
  ownerNodeID: test-node-003
  shared: "yes"
  thinProvision: "no"
  vgPattern: ^lvmvg$
  volGroup: lvmvg
status:
  state: Ready

lvSize: 1Gib , lvmvo: 16Gib

kubectl get pvc -n ns-gewclvtg data-test-ytxrw6b0n6-mongodb-2 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
    volume.kubernetes.io/selected-node: test-node-003
    volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
  creationTimestamp: "2024-06-19T06:35:23Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: test-ytxrw6b0n6
    app.kubernetes.io/managed-by: kubeblocks
    app.kubernetes.io/name: mongodb
    apps.kubeblocks.io/component-name: mongodb
    apps.kubeblocks.io/vct-name: data
    kubeblocks.io/volume-type: data
  name: data-test-ytxrw6b0n6-mongodb-2
  namespace: ns-gewclvtg
  ownerReferences:
  - apiVersion: apps.kubeblocks.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Component
    name: test-ytxrw6b0n6-mongodb
    uid: f131a510-755d-4ae2-a828-9d878b82e121
  resourceVersion: "241412684"
  uid: bcc852f8-7266-48c4-aedc-34adb523d224
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 16Gi
  storageClassName: openebs-backup
  volumeMode: Filesystem
  volumeName: pvc-bcc852f8-7266-48c4-aedc-34adb523d224
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 16Gi
  phase: Bound

What did you expect to happen:

lv Size = LvmVolume cr size The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)

| I0620 02:51:50.488174 1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/77cf54bb-be60-44e4-8ae0-bb3744b2f10e/volumes/kuberne tes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s. io/ephemeral":"false","csi.storage.k8s.io/pod.name":"xxx-mongodb-2","csi.storage.k8s.io/pod.namespace":"ns-gewclvtg","csi.storage.k8s.io/pod.uid":"77cf54bb-be60-44e4-8ae 0-bb3744b2f10e","csi.storage.k8s.io/serviceAccount.name":"xxx","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"lvmvg","storage.kubernetes.io/cs iProvisionerIdentity":"1718646815547-5148-local.csi.openebs.io"},"volume_id":"pvc-bcc852f8-7266-48c4-aedc-34adb523d224"} I0620 02:51:50.534603 1 mount_linux.go:312] fsck error fsck from util-linux 2.38.1 /dev/mapper/lvmvg-pvc--bcc852f8--7266--48c4--aedc--34adb523d224 is mounted. e2fsck: Cannot continue, aborting.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

bxy4543 commented 2 weeks ago

kubectl logs -n kube-system openebs-lvm-node-d8krs -c openebs-lvm-plugin

I0619 06:35:24.036535       1 volume.go:85] Got add event for Vol pvc-bcc852f8-7266-48c4-aedc-34adb523d224
I0619 06:35:24.036547       1 volume.go:86] lvmvolume object to be enqueued by Add handler: &{{LVMVolume local.openebs.io/v1alpha1} {pvc-bcc852f8-7266-48c4-aedc-34adb523d224  openebs  00e27140-261e-449f-a048-af94b1b312f3 241412558 1 2024-06-19 06:35:24 +0000 UTC <nil> <nil> map[] map[] [] [] [{lvm-driver Update local.openebs.io/v1alpha1 2024-06-19 06:35:24 +0000 UTC FieldsV1 {"f:spec":{".":{},"f:capacity":{},"f:ownerNodeID":{},"f:shared":{},"f:thinProvision":{},"f:vgPattern":{},"f:volGroup":{}},"f:status":{".":{},"f:state":{}}} }]} {test-node-003  ^lvmvg$ 1073741824 yes no} {Pending <nil>}}
I0619 06:35:24.036609       1 volume.go:55] Getting lvmvol object name:pvc-bcc852f8-7266-48c4-aedc-34adb523d224, ns:openebs from cache
I0619 06:35:24.396811       1 lvm_util.go:308] lvm: created volume lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224
I0619 06:35:24.406693       1 volume.go:367] Successfully synced 'openebs/pvc-bcc852f8-7266-48c4-aedc-34adb523d224'
I0619 06:35:26.217611       1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/e128a99f-28ea-4ed2-b098-900b886744bb/volumes/kubernetes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"test-ytxrw6b0n6-mongodb-2","csi.storage.k8s.io/pod.namespace":"ns-gewclvtg","csi.storage.k8s.io/pod.uid":"e128a99f-28ea-4ed2-b098-900b886744bb","csi.storage.k8s.io/serviceAccount.name":"test-mongodb-test-ytxrw6b0n6","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"lvmvg","storage.kubernetes.io/csiProvisionerIdentity":"1718646815547-5148-local.csi.openebs.io"},"volume_id":"pvc-bcc852f8-7266-48c4-aedc-34adb523d224"}
I0619 06:35:26.240614       1 mount_linux.go:367] Disk "/dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224]
I0619 06:35:26.264351       1 mount_linux.go:377] Disk successfully formatted (mkfs): ext4 - /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224 /var/lib/kubelet/pods/e128a99f-28ea-4ed2-b098-900b886744bb/volumes/kubernetes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount
I0619 06:35:26.280532       1 mount.go:209] lvm: volume lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224 mounted /var/lib/kubelet/pods/e128a99f-28ea-4ed2-b098-900b886744bb/volumes/kubernetes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount fs ext4
I0620 02:51:50.488174       1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/77cf54bb-be60-44e4-8ae0-bb3744b2f10e/volumes/kubernetes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"test-ytxrw6b0n6-mongodb-2","csi.storage.k8s.io/pod.namespace":"ns-gewclvtg","csi.storage.k8s.io/pod.uid":"77cf54bb-be60-44e4-8ae0-bb3744b2f10e","csi.storage.k8s.io/serviceAccount.name":"test-mongodb-test-ytxrw6b0n6","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"lvmvg","storage.kubernetes.io/csiProvisionerIdentity":"1718646815547-5148-local.csi.openebs.io"},"volume_id":"pvc-bcc852f8-7266-48c4-aedc-34adb523d224"}
I0620 02:51:50.534603       1 mount_linux.go:312] `fsck` error fsck from util-linux 2.38.1
/dev/mapper/lvmvg-pvc--bcc852f8--7266--48c4--aedc--34adb523d224 is mounted.
e2fsck: Cannot continue, aborting.

I0620 02:51:50.535623       1 mount.go:209] lvm: volume lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224 mounted /var/lib/kubelet/pods/77cf54bb-be60-44e4-8ae0-bb3744b2f10e/volumes/kubernetes.io~csi/pvc-bcc852f8-7266-48c4-aedc-34adb523d224/mount fs ext4
I0620 02:51:50.535640       1 grpc.go:81] GRPC response: {}
I0620 02:51:57.684960       1 lvmnode.go:306] Successfully synced 'openebs/test-node-003
dsharma-dc commented 2 weeks ago

@bxy4543 Could you share the storage class and pvc yaml files here, or in Github gist?

bxy4543 commented 2 weeks ago

controller version:

        image: registry.k8s.io/sig-storage/csi-resizer:v1.8.0
        image: registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2
        image: registry.k8s.io/sig-storage/snapshot-controller:v6.2.2
        image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0
openebs/lvm-driver:ci

and i found error:

{"log":"I0619 06:35:25.755602       1 event.go:298] Event(v1.ObjectReference{Kind:\"PersistentVolumeClaim\", Namespace:\"ns-gewclvtg\", Name:\"data-test-ytxrw6b0n6-mongodb-2\", UID:\"bcc852f8-7266-48c4-aedc-34adb523d224\", APIVersion:\"v1\", ResourceVersion:\"241412655\", FieldPath:\"\"}): type: 'Warning' reason: 'VolumeResizeFailed' Mark PVC \"ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2\" as file system resize required failed: can't patch status of  PVC ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2 with Operation cannot be fulfilled on persistentvolumeclaims \"data-test-ytxrw6b0n6-mongodb-2\": the object has been modified; please apply your changes to the latest version and try again","logtag":"F"} |  
-- | --
  |   | {"log":"E0619 06:35:25.755515       1 controller.go:286] Error syncing PVC: Mark PVC \"ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2\" as file system resize required failed: can't patch status of  PVC ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2 with Operation cannot be fulfilled on persistentvolumeclaims \"data-test-ytxrw6b0n6-mongodb-2\": the object has been modified; please apply your changes to the latest version and try again","logtag":"F"}
bxy4543 commented 2 weeks ago
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-backup
parameters:
  shared: "yes"
  storage: lvm
  volgroup: lvmvg
provisioner: local.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
    volume.kubernetes.io/selected-node: test-node-003
    volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
  creationTimestamp: "2024-06-19T06:35:23Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: test-ytxrw6b0n6
    app.kubernetes.io/managed-by: kubeblocks
    app.kubernetes.io/name: mongodb
    apps.kubeblocks.io/component-name: mongodb
    apps.kubeblocks.io/vct-name: data
    kubeblocks.io/volume-type: data
  name: data-test-ytxrw6b0n6-mongodb-2
  namespace: ns-gewclvtg
  ownerReferences:
  - apiVersion: apps.kubeblocks.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Component
    name: test-ytxrw6b0n6-mongodb
    uid: f131a510-755d-4ae2-a828-9d878b82e121
  resourceVersion: "241412684"
  uid: bcc852f8-7266-48c4-aedc-34adb523d224
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 16Gi
  storageClassName: openebs-backup
  volumeMode: Filesystem
  volumeName: pvc-bcc852f8-7266-48c4-aedc-34adb523d224
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 16Gi
  phase: Bound
dsharma-dc commented 2 weeks ago

The filesystem resize path seems to have run into error while doing fsck and that left it there. I'll check this for some more clues.

bxy4543 commented 2 weeks ago

Thank you for your support. After the underlying lv is expanded, the file system expansion step is reached. However, it can be seen that the lv of the test-node-003 node has not been successfully expanded.

dsharma-dc commented 2 weeks ago

Thank you for your support. After the underlying lv is expanded, the file system expansion step is reached. However, it can be seen that the lv of the test-node-003 node has not been successfully expanded.

I see that the storage class mentions shared, are there actually two applications currently using the volume when this happened?

bxy4543 commented 2 weeks ago

And I see fsck comes from mount_linux.formatAndMountSensitive which is referenced in NodePublishVolume.MountFilesystem, but my pod is still able to mount the pvc successfully

bxy4543 commented 2 weeks ago

I see that the storage class mentions shared, are there actually two applications currently using the volume when this happened?

It was there for a while for backup, but it's gone now

dsharma-dc commented 2 weeks ago

@bxy4543 If possible please upload all log files here as zipped.

bxy4543 commented 2 weeks ago

log.tar.gz @dsharma-dc

dsharma-dc commented 2 weeks ago

The openebs-lvm-node-003 log is missing the logs from earlier timestamp for volume pvc-bcc852f8-7266-48c4-aedc-34adb523d224. I would also like to see output of kubectl describe pvc <pvcname> I notice that I also see fsck error when I try this scenario locally, but that doesn' impact anything w.r.t expanding the pvc. Expand succeeds.

bxy4543 commented 2 weeks ago
kubectl describe pvc -n ns-gewclvtg data-test-ytxrw6b0n6-mongodb-2
Name:          data-test-ytxrw6b0n6-mongodb-2
Namespace:     ns-gewclvtg
StorageClass:  openebs-backup
Status:        Bound
Volume:        pvc-bcc852f8-7266-48c4-aedc-34adb523d224
Labels:        app.kubernetes.io/instance=test-ytxrw6b0n6
               app.kubernetes.io/managed-by=kubeblocks
               app.kubernetes.io/name=mongodb
               apps.kubeblocks.io/component-name=mongodb
               apps.kubeblocks.io/vct-name=data
               kubeblocks.io/volume-type=data
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
               volume.kubernetes.io/selected-node: test-node-003
               volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      16Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       test-ytxrw6b0n6-mongodb-2
Events:        <none>
dsharma-dc commented 2 weeks ago

Two questions:

  1. Can you share the describe of pv, lvmvolume and o/p of lvdisplay /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224 again.
  2. Where was this error seen? If you can provide that log: controller.go:286] Error syncing PVC: Mark PVC \"ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2\" as file system resize required failed: can't patch status of PVC
bxy4543 commented 2 weeks ago
root@test-node-003:~# lvdisplay /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  --- Logical volume ---
  LV Path                /dev/lvmvg/pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  LV Name                pvc-bcc852f8-7266-48c4-aedc-34adb523d224
  VG Name                lvmvg
  LV UUID                vjfkb2-ls7c-YzO7-nZYf-Ds70-3CVP-7EA5jl
  LV Write Access        read/write
  LV Creation host, time openebs-lvm-node-d8krs, 2024-06-19 14:35:24 +0800
  LV Status              available
  # open                 1
  LV Size                1.00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:151
  kubectl describe pv pvc-bcc852f8-7266-48c4-aedc-34adb523d224
Name:              pvc-bcc852f8-7266-48c4-aedc-34adb523d224
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: local.csi.openebs.io
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      openebs-backup
Status:            Bound
Claim:             ns-gewclvtg/data-test-ytxrw6b0n6-mongodb-2
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          16Gi
Node Affinity:
  Required Terms:
    Term 0:        openebs.io/nodename in [test-node-003]
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            local.csi.openebs.io
    FSType:            ext4
    VolumeHandle:      pvc-bcc852f8-7266-48c4-aedc-34adb523d224
    ReadOnly:          false
    VolumeAttributes:      openebs.io/cas-type=localpv-lvm
                           openebs.io/volgroup=lvmvg
                           storage.kubernetes.io/csiProvisionerIdentity=1718646815547-5148-local.csi.openebs.io
Events:                <none>
kubectl describe lvmvolume -n openebs pvc-bcc852f8-7266-48c4-aedc-34adb523d224
Name:         pvc-bcc852f8-7266-48c4-aedc-34adb523d224
Namespace:    openebs
Labels:       kubernetes.io/nodename=test-node-003
Annotations:  <none>
API Version:  local.openebs.io/v1alpha1
Kind:         LVMVolume
Metadata:
  Creation Timestamp:  2024-06-19T06:35:24Z
  Finalizers:
    lvm.openebs.io/finalizer
  Generation:        4
  Resource Version:  241412658
  UID:               00e27140-261e-449f-a048-af94b1b312f3
Spec:
  Capacity:        17179869184
  Owner Node ID:   test-node-003
  Shared:          yes
  Thin Provision:  no
  Vg Pattern:      ^lvmvg$
  Vol Group:       lvmvg
Status:
  State:  Ready
Events:   <none>

And the Error syncing PVC log comes from the controller csi-resizer module. The old log has been cleaned up, but I found a new similar log:

Common labels: {"app":"openebs-lvm-controller","container":"csi-resizer","instance":"openebs-lvm-controller-546768b57c-tlnbf","job":"fluent-bit","namespace":"kube-system","node":"test-node-001","stream":"stderr"}
Line limit: "1000 (45 returned)"
Total bytes processed: "228  kB"

2024-06-20T22:23:55+08:00   {"log":"I0620 14:23:55.005706       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:49+08:00   {"log":"I0620 14:23:49.999362       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:44+08:00   {"log":"I0620 14:23:44.994267       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:39+08:00   {"log":"I0620 14:23:39.989566       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:34+08:00   {"log":"I0620 14:23:34.984822       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:29+08:00   {"log":"I0620 14:23:29.979466       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:24+08:00   {"log":"I0620 14:23:24.975015       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:19+08:00   {"log":"I0620 14:23:19.969413       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:14+08:00   {"log":"I0620 14:23:14.963766       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:09+08:00   {"log":"I0620 14:23:09.957577       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:05+08:00   {"log":"I0620 14:23:05.200975       1 controller.go:343] No need to resize PVC \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}
2024-06-20T22:23:05+08:00   {"log":"I0620 14:23:05.200931       1 controller.go:295] Started PVC processing \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.952653       1 leaderelection.go:276] successfully renewed lease kube-system/external-resizer-local-csi-openebs-io","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.217272       1 event.go:298] Event(v1.ObjectReference{Kind:\"PersistentVolumeClaim\", Namespace:\"ns-3m5m56ni\", Name:\"data-teable-uylicrfk-pg-postgresql-0\", UID:\"b781dc96-49e1-4391-b09f-3da5974f2f52\", APIVersion:\"v1\", ResourceVersion:\"408970568\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeResizeSuccessful' Resize volume succeeded","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.217271       1 controller.go:343] No need to resize PVC \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.217262       1 controller.go:295] Started PVC processing \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.217231       1 controller.go:546] Resize PVC \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\" finished","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.211897       1 controller.go:489] Update capacity of PV \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\" to 8Gi succeeded","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.208677       1 controller.go:590] Resize volume succeeded for volume \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\", start to update PV's capacity","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.208671       1 controller.go:483] Resize volume succeeded for volume \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\", start to update PV's capacity","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.208665       1 connection.go:201] GRPC error: \u003cnil\u003e","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.208658       1 connection.go:200] GRPC response: {\"capacity_bytes\":8589934592}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203509       1 connection.go:194] GRPC request: {\"capacity_range\":{\"required_bytes\":8589934592},\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_id\":\"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\"}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203507       1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerExpandVolume","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203495       1 connection.go:201] GRPC error: \u003cnil\u003e","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203488       1 connection.go:200] GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":1}}},{\"Type\":{\"Rpc\":{\"type\":9}}},{\"Type\":{\"Rpc\":{\"type\":5}}},{\"Type\":{\"Rpc\":{\"type\":4}}}]}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203149       1 event.go:298] Event(v1.ObjectReference{Kind:\"PersistentVolumeClaim\", Namespace:\"ns-3m5m56ni\", Name:\"data-teable-uylicrfk-pg-postgresql-0\", UID:\"b781dc96-49e1-4391-b09f-3da5974f2f52\", APIVersion:\"v1\", ResourceVersion:\"408970568\", FieldPath:\"\"}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-b781dc96-49e1-4391-b09f-3da5974f2f52","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203094       1 connection.go:194] GRPC request: {}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.203087       1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerGetCapabilities","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.200034       1 event.go:298] Event(v1.ObjectReference{Kind:\"PersistentVolumeClaim\", Namespace:\"ns-3m5m56ni\", Name:\"data-teable-uylicrfk-pg-postgresql-0\", UID:\"b781dc96-49e1-4391-b09f-3da5974f2f52\", APIVersion:\"v1\", ResourceVersion:\"408970566\", FieldPath:\"\"}): type: 'Warning' reason: 'VolumeResizeFailed' Mark PVC \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\" as file system resize required failed: can't patch status of  PVC ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0 with Operation cannot be fulfilled on persistentvolumeclaims \"data-teable-uylicrfk-pg-postgresql-0\": the object has been modified; please apply your changes to the latest version and try again","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.199989       1 controller.go:295] Started PVC processing \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"E0620 14:23:04.199959       1 controller.go:286] Error syncing PVC: Mark PVC \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\" as file system resize required failed: can't patch status of  PVC ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0 with Operation cannot be fulfilled on persistentvolumeclaims \"data-teable-uylicrfk-pg-postgresql-0\": the object has been modified; please apply your changes to the latest version and try again","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.197199       1 controller.go:489] Update capacity of PV \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\" to 8Gi succeeded","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.192885       1 controller.go:590] Resize volume succeeded for volume \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\", start to update PV's capacity","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.192878       1 controller.go:483] Resize volume succeeded for volume \"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\", start to update PV's capacity","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.192871       1 connection.go:201] GRPC error: \u003cnil\u003e","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.192861       1 connection.go:200] GRPC response: {\"capacity_bytes\":8589934592,\"node_expansion_required\":true}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.178124       1 connection.go:194] GRPC request: {\"capacity_range\":{\"required_bytes\":8589934592},\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_id\":\"pvc-b781dc96-49e1-4391-b09f-3da5974f2f52\"}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.178121       1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerExpandVolume","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.178106       1 connection.go:201] GRPC error: \u003cnil\u003e","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.178098       1 connection.go:200] GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":1}}},{\"Type\":{\"Rpc\":{\"type\":9}}},{\"Type\":{\"Rpc\":{\"type\":5}}},{\"Type\":{\"Rpc\":{\"type\":4}}}]}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.177597       1 event.go:298] Event(v1.ObjectReference{Kind:\"PersistentVolumeClaim\", Namespace:\"ns-3m5m56ni\", Name:\"data-teable-uylicrfk-pg-postgresql-0\", UID:\"b781dc96-49e1-4391-b09f-3da5974f2f52\", APIVersion:\"v1\", ResourceVersion:\"408970566\", FieldPath:\"\"}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-b781dc96-49e1-4391-b09f-3da5974f2f52","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.177454       1 connection.go:194] GRPC request: {}","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.177445       1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerGetCapabilities","logtag":"F"}
2024-06-20T22:23:04+08:00   {"log":"I0620 14:23:04.165365       1 controller.go:295] Started PVC processing \"ns-3m5m56ni/data-teable-uylicrfk-pg-postgresql-0\"","logtag":"F"}

same problem:

root@test-node-004:~# kubectl describe pvc -n ns-3m5m56ni data-teable-uylicrfk-pg-postgresql-0
Name:          data-teable-uylicrfk-pg-postgresql-0
Namespace:     ns-3m5m56ni
StorageClass:  openebs-lvmpv
Status:        Bound
Volume:        pvc-b781dc96-49e1-4391-b09f-3da5974f2f52
Labels:        app.kubernetes.io/instance=teable-uylicrfk-pg
               app.kubernetes.io/managed-by=kubeblocks
               app.kubernetes.io/name=postgresql
               apps.kubeblocks.io/component-name=postgresql
               apps.kubeblocks.io/vct-name=data
               kubeblocks.io/volume-type=data
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
               volume.kubernetes.io/selected-node: test-node-004
               volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       teable-uylicrfk-pg-postgresql-0
Events:        <none>
root@test-node-004:~#
root@test-node-004:~# kubectl describe lvmvolume -n openebs pvc-bcf9096f-b5d7-4a2d-8425-0b3ea2fce0a3
Name:         pvc-bcf9096f-b5d7-4a2d-8425-0b3ea2fce0a3
Namespace:    openebs
Labels:       kubernetes.io/nodename=test-node-004
Annotations:  <none>
API Version:  local.openebs.io/v1alpha1
Kind:         LVMVolume
Metadata:
  Creation Timestamp:  2024-06-18T06:38:28Z
  Finalizers:
    lvm.openebs.io/finalizer
  Generation:        3
  Resource Version:  398423326
  UID:               1540acdc-5c42-449a-82f7-dcc7831024e9
Spec:
  Capacity:        1073741824
  Owner Node ID:   test-node-004
  Shared:          yes
  Thin Provision:  no
  Vg Pattern:      ^lvmvg$
  Vol Group:       lvmvg
Status:
  State:  Ready
Events:   <none>
root@test-node-004:~# lvdisplay /dev/lvmvg/pvc-bcf9096f-b5d7-4a2d-8425-0b3ea2fce0a3
  --- Logical volume ---
  LV Path                /dev/lvmvg/pvc-bcf9096f-b5d7-4a2d-8425-0b3ea2fce0a3
  LV Name                pvc-bcf9096f-b5d7-4a2d-8425-0b3ea2fce0a3
  VG Name                lvmvg
  LV UUID                MhTTYf-XHAw-RhGc-WXfF-Uyeu-rNMS-KmMqd6
  LV Write Access        read/write
  LV Creation host, time openebs-lvm-node-c66l4, 2024-06-18 14:38:29 +0800
  LV Status              available
  # open                 1
  LV Size                1.00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:352
dsharma-dc commented 2 weeks ago

There are few things that are observed here:

  1. The LVM LV didn't get expanded because the csi node plugin didn't receive NodeExpandVolume call. The LV is expanded as part of this gRPC and filesystem is natively expanded by LVM(except btrfs which we do from the plugin). I'm assuming the application pod was running and attached to the pvc here.
  2. The NodeExpandVolume call was not sent to csi-node plugin because csi-resizer failed to mark the pvc with node expansion required as seen from the error logs from csi-resizer.
  3. Why csi-resizer failed to patch pvc status correctly is not clear.

It would be good if you can share the version of all the things involved i.e. kubernetes, openebs plugin, host OS details etc. Also if it is reproducible easily every time, then request to repro and provide the csi-controller logs (openebs-lvm-plugin + csi-resizer containers), and csi-node logs(openebs-lvm-plugin container) from that timestamp.

NOTE: In the last comment, the pvc and lvmvolume look unrelated, they are having different IDs.

bxy4543 commented 1 week ago

Thanks, I'll share it later

dsharma-dc commented 1 week ago

It would also be interesting to know how the edits are being made to the PVC. Also, whether a retry worked successfully or not. The error doesn't indicate any issue with this provisioner till now. The failure happened in kubernetes API calls during resize due to PVC edit conflicting with some other conflicting operation on PVC.

bxy4543 commented 1 week ago

Yes, but I wonder why the csi controller failed and the retry succeeded, but the LVMVolume was inconsistent with the physical LV

dsharma-dc commented 1 week ago

but the LVMVolume was inconsistent with the physical LV

this was because physical LV is expanded when NodeExpandVolume gRPC is sent by csi-resizer sidecar, which in this case never arrived due to the pvc patching error in sidecar.

bxy4543 commented 1 week ago

How to avoid this situation? why does patching the error pvc not re-trigger grpc

dsharma-dc commented 1 week ago

How to avoid this situation? why does patching the error pvc not re-trigger grpc

I'd guess kubernetes expects us to retry manually as the message says please apply your changes to the latest version and try again. We can check with k8s folks w.r.t this behaviour on Slack. All the details that I requested earlier would be required, or if you'd like you can go ahead posting the query.

bxy4543 commented 1 week ago

please apply your changes to the latest version and try again does not mean that we need to create it manually. This is entirely due to the inconsistency of pvc resources. I can trigger it 100% and just keep adding pvc tags when expanding pvc:

cat rechange-pvc.sh
#!/bin/bash

NAMESPACE="ns-hwhbg4vf"
PVC_NAME="data-dfsd-mongodb-0"
LABEL_KEY="test"

while true; do
  CURRENT_VALUE=$(kubectl get pvc $PVC_NAME -n $NAMESPACE -o jsonpath="{.metadata.labels.$LABEL_KEY}")

  if [[ -z "$CURRENT_VALUE" ]]; then
    NEXT_VALUE=1
  elif [[ "$CURRENT_VALUE" =~ ^[0-9]+$ ]]; then
    NEXT_VALUE=$((CURRENT_VALUE + 1))
  else
    echo "tag $LABEL_KEY Current value is not an integer: $CURRENT_VALUE"
    exit 1
  fi

  echo "Update Tags $LABEL_KEY 到 $NEXT_VALUE"
  kubectl label pvc $PVC_NAME -n $NAMESPACE $LABEL_KEY=$NEXT_VALUE --overwrite

  sleep 0.5
done
lvdisplay /dev/mapper/lvmvg-pvc--c745ae87--6b56--4684--9cfb--9e21ca9517e5
  --- Logical volume ---
  LV Path                /dev/lvmvg/pvc-c745ae87-6b56-4684-9cfb-9e21ca9517e5
  LV Name                pvc-c745ae87-6b56-4684-9cfb-9e21ca9517e5
  VG Name                lvmvg
  LV UUID                40ZI9j-4f2C-M1AW-CVrs-ZcSk-wR53-NwMUCF
  LV Write Access        read/write
  LV Creation host, time openebs-lvm-node-nddtz, 2024-06-27 02:49:15 +0000
  LV Status              available
  # open                 1
  LV Size                3.00 GiB
  Current LE             768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:13

root@dev-56:~# kubectl get pvc -n ns-hwhbg4vf data-dfsd-mongodb-0 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io
    volume.kubernetes.io/selected-node: dev-56
    volume.kubernetes.io/storage-provisioner: local.csi.openebs.io
  creationTimestamp: "2024-06-27T10:10:31Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: dfsd
    app.kubernetes.io/managed-by: kubeblocks
    app.kubernetes.io/name: mongodb
    apps.kubeblocks.io/component-name: mongodb
    apps.kubeblocks.io/vct-name: data
    kubeblocks.io/volume-type: data
    test: "98"
  name: data-dfsd-mongodb-0
  namespace: ns-hwhbg4vf
  ownerReferences:
  - apiVersion: apps.kubeblocks.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Component
    name: dfsd-mongodb
    uid: 4cc9d636-2084-4f81-871b-0ee30567fa52
  resourceVersion: "99144306"
  uid: ce0a06d8-53e5-46a9-b529-a79f0fa101b5
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: openebs-lvmpv
  volumeMode: Filesystem
  volumeName: pvc-c745ae87-6b56-4684-9cfb-9e21ca9517e5
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 4Gi
  phase: Bound

pvc-c745ae87-6b56-4684-9cfb-9e21ca9517e5: 4Gi Physical size: 3Gi

bxy4543 commented 1 week ago

I think ControllerExpandVolume needs to add a state to modify the LVMVolume resource. Otherwise, it is not advisable to just modify LVMVolume.spec.capacity to represent the success of ControllerExpandVolume execution. Otherwise, even if NodeExpandVolume execution fails, ControllerExpandVolume will not retry

dsharma-dc commented 1 week ago

The one key difference here in this LVM provisioner from other e.g openebs/Mayastor is that ControllerExpandVolume actually expands the volume there but not in case of LVM. This is because with LVM we can't split the volume expand operation into two step - 1. to expand underlying volume , 2. expand the filesystem. LVM natively expands filesystem too. That said, I see there could be some scope of improvement for LVM expansion workflow. Also investigate the ControllerExpandVolume return path as suggested above. Will keep it under track.

I haven't been able to reproduce the error yet using the script provided by you though.

  1. created pvc
  2. published application
  3. ran label add script
  4. do kubectl pvc edit to resize
bxy4543 commented 1 week ago

Got it, looking forward to improving this problem. I can always reproduce it in my environment:

uname -a
Linux dev-55 5.15.0-113-generic #123-Ubuntu SMP Mon Jun 10 08:16:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
kubectl get no -owide
NAME               STATUS   ROLES           AGE    VERSION    INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
dev-55             Ready    control-plane   152d   v1.27.13   192.168.0.55   <none>        Ubuntu 22.04.3 LTS   5.15.0-113-generic   containerd://1.6.23
dev-56             Ready    <none>          152d   v1.27.13   192.168.0.56   <none>        Ubuntu 22.04.3 LTS   5.15.0-113-generic   containerd://1.6.23
dev-58             Ready    <none>          152d   v1.27.13   192.168.0.58   <none>        Ubuntu 22.04.3 LTS   5.15.0-113-generic   containerd://1.6.23

use version:

registry.k8s.io/sig-storage/csi-resizer:v1.8.0
registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2
registry.k8s.io/sig-storage/snapshot-controller:v6.2.2
registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0
openebs/lvm-driver:ci