Open ukreddy-erwin opened 2 years ago
Hello,
I'm trying to understand what's going on here.
Can you confirm if you set the kubernetes.io/pvc-protection
finalizer on the PVC and which controller is expected to remove it ?
Thanks !
Hello,
I'm trying to understand what's going on here. Can you confirm if you set the
kubernetes.io/pvc-protection
finalizer on the PVC and which controller is expected to remove it ?Thanks !
yes.
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-persistent-storage-db-0 Bound pvc-51256bfd-4e32-4a4f-a24b-c0f47f9e1d63 100Gi RWO ssd 152m
prometheus-pvc Terminating pvc-9453236c-ffc3-4161-a205-e057c3e1ba77 20Gi RWO hdd 152m
register-pvc Terminating pvc-ddfef2b9-9723-4651-916b-2cb75baf0f22 20Gi RWO ssd 152m
-bash-4.2$ kubectl edit pvc prometheus-pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-10-0-130-106.us-west-2.compute.internal
creationTimestamp: "2022-06-23T10:22:44Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2022-06-23T12:29:32Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: prometheus
name: prometheus-pvc
namespace: default
resourceVersion: "29930"
uid: 9453236c-ffc3-4161-a205-e057c3e1ba77
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: hdd
volumeMode: Filesystem
volumeName: pvc-9453236c-ffc3-4161-a205-e057c3e1ba77
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
phase: Bound
I'm also having the same issue, we create the PersistentVolume using Helm chart and the PersistentVolumeClaim with Terraform. The creation goes successfully, but when I try to destroy the PVC it fails with the same issue that is being mentioned in this PR.
06:38:12 TestK8sJenkins 2022-07-07T04:38:11Z logger.go:66: module.k8s_jenkins.kubernetes_persistent_volume_claim.persistence[0]: Still destroying... [id=inttest-k8s-jenkins-qxlte5/jenkins-home, 19m30s elapsed]
06:38:21 TestK8sJenkins 2022-07-07T04:38:21Z logger.go:66: module.k8s_jenkins.kubernetes_persistent_volume_claim.persistence[0]: Still destroying... [id=inttest-k8s-jenkins-qxlte5/jenkins-home, 19m40s elapsed]
06:38:31 TestK8sJenkins 2022-07-07T04:38:31Z logger.go:66: module.k8s_jenkins.kubernetes_persistent_volume_claim.persistence[0]: Still destroying... [id=inttest-k8s-jenkins-qxlte5/jenkins-home, 19m50s elapsed]
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66: Error: Persistent volume claim jenkins-home still exists with finalizers: [kubernetes.io/pvc-protection]
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66: Error: context deadline exceeded
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z logger.go:66:
06:38:41 TestK8sJenkins 2022-07-07T04:38:41Z retry.go:99: Returning due to fatal error: FatalError{Underlying: error while running command: exit status 1;
06:38:41 Error: Persistent volume claim jenkins-home still exists with finalizers: [kubernetes.io/pvc-protection]
06:38:41
06:38:41
06:38:41
06:38:41 Error: context deadline exceeded
06:38:41
06:38:41 }
Name: jenkins-home
Namespace: jenkins
StorageClass: efs-persistence
Status: Bound
Volume: persistence
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 5Gi
Access Modes: RWX
VolumeMode: Filesystem
Mounted By: jenkins-7d87596c5d-p9xt8
Events: <none>
Any chance you've figured this out? I'd think it would be commonplace but I don't even know where to look at this point.
I've been having this problem for some time and finally realised what was wrong. Now, this was on my system, so the solution might not work for you...
For me, the problem was that Terraform had no way of knowing that there's a dependency between the efs-csi driver deployment/daemonset and the pvc and pv. This meant that Terraform could end up removing the efs-csi driver before taking down the pvc and pv.
My solution was to add a explicit depends_on on my kubernetes_persistent_volume
and kubernetes_persistent_volume_claim
.
@MadsRC Thank you so much for posting this. This is exactly what I needed to do to fix the same issue. So simple I didn't think of using "depends_on" Thanks
Currently experiencing this in a scenario where:
During terraform destroy
, the behavior is:
still exists with finalizers: [kubernetes.io/pvc-protection]
Is there a race condition on updating the "used by" index?
To be clear, when the PV is not created by TF, it does not seem like the explicit depends_on
relationship that @MadsRC reports makes sense. There isn't a resource
to depend on, and there isn't a PV destroy operation that could land out of order.
Maybe the dependency graph would be considered more complete using kubernetes_persistent_volume_v1 data source (we currently don't), but that should not change the number of destroy operations or their relative order.
Has there been any update on this issue? We're facing the same problem. TF hangs and fails due to pvc protection and out of order deletion. Our current options are: 1. Delete pvc manually and restart pipeline. 2. attempt to patch pvc to remove the finalizer.
Has anybody solved this via TF?
We have done kubernetes deployment using terraform kubernetes provider, while creating the cluster eks itself.
When we try to destroy after that, didn't use the product yet, just testing the destroy. Got below error with terraform destroy.
Please suggest how to fix this.