rancher / local-path-provisioner

Dynamically provisioning persistent local storage with Kubernetes
Apache License 2.0
2.27k stars 454 forks source link

unable to delete the helper pod "helper-pod-delete-pvc-XYZ" #438

Open applike-ss opened 4 months ago

applike-ss commented 4 months ago

Hello,

i am experiencing the following error quite often:

time="2024-07-30T08:25:22Z" level=error msg="unable to delete the helper pod: pods \"helper-pod-delete-pvc-f02f13f9-7ffd-45bc-ba90-629cf304c9a9\" not found" 

In my use-case, i am creating many (50+) pvcs at almost the same time. Is there maybe a race condition? After the pod using the pv is terminated, the pvs stay in released state instead of beind removed.

I am using v0.0.27 currently, as i use a different registry that doesn't contain v0.0.28 just yet and docker hub pull limits are just too low for every day production use.

Let me know if you need any more information to track this down.

EDIT: Additional information: for HA reasons i used 3 replicas instead of the default value from the chart. However I do see in the code that there was/is no Leader election enabled (it is set to disabled in the code).

jan-g commented 3 months ago

Can you check the node name and labels to see if #413 applies?

applike-ss commented 3 months ago

Can you check the node name and labels to see if #413 applies?

On my end the labels match the nodes name, so i assume your issue is not the same as mine.

It seems more related to quickly creating and removing lots of volumes. In my case i was creating lots of pods with ephemeral volumes using the csi driver.

derekbit commented 3 months ago

Hello,

i am experiencing the following error quite often:

time="2024-07-30T08:25:22Z" level=error msg="unable to delete the helper pod: pods \"helper-pod-delete-pvc-f02f13f9-7ffd-45bc-ba90-629cf304c9a9\" not found" 

In my use-case, i am creating many (50+) pvcs at almost the same time. Is there maybe a race condition? After the pod using the pv is terminated, the pvs stay in released state instead of beind removed.

I am using v0.0.27 currently, as i use a different registry that doesn't contain v0.0.28 just yet and docker hub pull limits are just too low for every day production use.

Let me know if you need any more information to track this down.

EDIT: Additional information: for HA reasons i used 3 replicas instead of the default value from the chart. However I do see in the code that there was/is no Leader election enabled (it is set to disabled in the code).

@applike-ss How did you run into the issue? Do you have steps to reproduce?

applike-ss commented 3 months ago

I have setup my Gitlab with a pipeline that spawns a lot of jobs (say 50) at the same time. The Gitlab runners are running inside my kubernetes cluster and use the local path provisioner for testing. Jobs may request new nodes on-demand if the cluster can not fulfill the request immediately. After some tests i discovered that some of the PVCs are not deleted after the job/pod terminated.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

applike-ss commented 4 weeks ago

/unstale