Open bernardgut opened 7 months ago
@niladrih do we need to add toleration for DiskPressure
to the cleanup Pod?
I'm getting the same error on openebs/provisioner-localpv:4.1.1
:
E1027 15:04:33.648888 1 controller.go:1007] error syncing volume "pvc-43c22848-1f2c-4471-9201-77ff5179c25c": failed to delete volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c: failed to delete volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c: clean up volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c failed: create process timeout after 120 seconds
I1027 15:04:33.648953 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-43c22848-1f2c-4471-9201-77ff5179c25c", UID:"a1fdb419-d2c6-4a05-90cd-7c437d439bab", APIVersion:"v1", ResourceVersion:"159262224", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' failed to delete volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c: failed to delete volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c: clean up volume pvc-43c22848-1f2c-4471-9201-77ff5179c25c failed: create process timeout after 120 seconds
2024-10-27T15:04:33.653Z ERROR app/provisioner.go:174 {"eventcode": "local.pv.delete.failure", "msg": "Failed to delete Local PV", "rname": "pvc-0c25df70-d565-4172-ae84-c79432cac3f5", "reason": "failed to delete host path", "storagetype": "local-hostpath"}
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).Delete
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner.go:174
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).deleteVolumeOperation
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:1511
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).syncVolume
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:1115
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).syncVolumeHandler
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:1045
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).processNextVolumeWorkItem.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:987
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).processNextVolumeWorkItem
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:1004
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).runVolumeWorker
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:905
sigs.k8s.io/sig-storage-lib-external-provisioner/v9/controller.(*ProvisionController).Run.func1.3
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v9@v9.0.3/controller/controller.go:857
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/go/pkg/mod/k8s.io/apimachinery@v0.25.16/pkg/util/wait/wait.go:157
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/go/pkg/mod/k8s.io/apimachinery@v0.25.16/pkg/util/wait/wait.go:158
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/apimachinery@v0.25.16/pkg/util/wait/wait.go:135
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/apimachinery@v0.25.16/pkg/util/wait/wait.go:92
E1027 15:04:33.653151 1 controller.go:1519] delete "pvc-0c25df70-d565-4172-ae84-c79432cac3f5": volume deletion failed: failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: clean up volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5 failed: create process timeout after 120 seconds
I1027 15:04:33.653273 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-0c25df70-d565-4172-ae84-c79432cac3f5", UID:"1a09afdb-1288-4428-ac7f-c00dd6f0800d", APIVersion:"v1", ResourceVersion:"161476499", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: clean up volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5 failed: create process timeout after 120 seconds
W1027 15:04:33.653187 1 controller.go:992] Retrying syncing volume "pvc-0c25df70-d565-4172-ae84-c79432cac3f5" because failures 0 < threshold 15
E1027 15:04:33.653752 1 controller.go:1007] error syncing volume "pvc-0c25df70-d565-4172-ae84-c79432cac3f5": failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: failed to delete volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5: clean up volume pvc-0c25df70-d565-4172-ae84-c79432cac3f5 failed: create process timeout after 120 seconds`
Describe the bug: After You purposefully create a large (80% of the node ephemeral storage)
pvc
to test the behavior of the localpv-provisioner, the provisioner successfully creates it, but fails to delete thepv
after thepvc
is removed, thus leaving the node withdiskPressure=true
and preventing further scheduling of pods on the node. Manual deletion of thepv
on kubernetes leaves the data on disk and persists the issue. On Talos 1.7.0 using Openebs-localpv-provisioner (Helm) and the default Talos deployment instructions in the docs.Expected behaviour: The provisioner successfully deletes the
pv
after thepvc
is deleted and/or successfully deletes the data after thepv
is manually deleted, thediskPressure=true
is removed and the node resumes operations.Steps to reproduce the bug:
/var/openebs/local
in the kubelet as per the docsThe output of the following commands will help us better understand what's going on: These are the logs of the localpv-provisioner container after the deletion. They run a loop of the following
kubectl get pods -n <openebs_namespace> --show-labels
kubectl logs <upgrade_job_pod> -n <openebs_namespace>
talosctl -n v1 disks
NODE DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH SUBSYSTEM READ_ONLY SYSTEM_DISK 10.2.0.8 /dev/sda QEMU HARDDISK - SSD - - scsi:t-0x00 - 22 GB /pci0000:00/0000:00:05.0/0000:01:01.0/virtio1/host2/target2:0:0/2:0:0:0/ /sys/class/block *