Closed kylebrooks-8451 closed 9 months ago
I think the issue is that the uri is empty for this bdev:
{
"aliases": "talos-df7-egr/f517fe5e-3f81-4484-b597-634483fe8d2d",
"blk_size": 512,
"claimed": false,
"claimed_by": "Orphaned",
"name": "f517fe5e-3f81-4484-b597-634483fe8d2d",
"num_blocks": 2097152,
"product_name": "Logical Volume",
"share_uri": "bdev:///f517fe5e-3f81-4484-b597-634483fe8d2d",
"uri": "",
"uuid": "f517fe5e-3f81-4484-b597-634483fe8d2d"
}
Using grpc to io-engine should be the last resort.
If you use kubectl-mayastor get volumes
does it return any volumes?
Yes, it still shows the volumes:
kubectl mayastor get volumes
ID REPLICAS TARGET-NODE ACCESSIBILITY STATUS SIZE
327b4e55-589c-42e2-baf9-d1bdadf63366 1 <none> <none> Online 1073741824
Great, and is the pvc and pv gone?
Yes, they are both gone. If it helps here is the YAML for the volume:
---
spec:
labels:
local: "true"
num_replicas: 1
size: 1073741824
status: Created
uuid: 327b4e55-589c-42e2-baf9-d1bdadf63366
topology:
node_topology:
explicit:
allowed_nodes:
- talos-proxmox-0
- talos-proxmox-1
- talos-tu8-r34
- talos-b2m-7a0
- talos-df7-egr
- talos-j92-0u1
- talos-srn-grj
- talos-ixc-r5k
preferred_nodes:
- talos-df7-egr
- talos-ixc-r5k
- talos-j92-0u1
- talos-proxmox-0
- talos-proxmox-1
- talos-srn-grj
- talos-tu8-r34
- talos-b2m-7a0
pool_topology:
labelled:
exclusion: {}
inclusion:
openebs.io/created-by: msp-operator
policy:
self_heal: true
state:
size: 1073741824
status: Online
uuid: 327b4e55-589c-42e2-baf9-d1bdadf63366
replica_topology:
f517fe5e-3f81-4484-b597-634483fe8d2d:
node: talos-df7-egr
pool: talos-df7-egr
state: Online
How strange, with retention set to delete then it should have deleted it, right @abhilashshetty04 ? Could we get logs from the csi-controller?
Here are the logs. One thing to note, I did try to delete the PV before the PVC and had to ctrl-c the kubectl delete pv
command and delete the pvc first. I'm wonder if I created a race condition indicated in the cis-attacher-log.txt
.
csi-attacher-log.txt csi-provisioner-log.txt csi-controller-log.txt
I1210 18:40:58.639219 1 csi_handler.go:276] Detaching "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.720361 1 csi_handler.go:583] Detached "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.749118 1 csi_handler.go:276] Detaching "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.755756 1 csi_handler.go:583] Detached "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.769362 1 csi_handler.go:283] Failed to save detach error to "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1": volumeattachments.storage.k8s.io "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1" not found
I1210 18:40:58.769780 1 csi_handler.go:228] Error processing "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1": failed to detach: could not mark as detached: volumeattachments.storage.k8s.io "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1" not found
I1210 18:41:17.331177 1 csi_handler.go:708] Removed finalizer from PV "pvc-327b4e55-589c-42e2-baf9-d1bdadf63366"
Oh right, if you try to delete the pv manually I don't think we support that as things stand @abhilashshetty04 ?
To recover from this you'd have to delete the mayastor volume using rest, something like this: curl -X 'DELETE' 'http://node:30011/v0/volumes/327b4e55-589c-42e2-baf9-d1bdadf63366' -H 'accept: /'
@tiagolobocastro That command deleted the volume and bdev, thank you!
How strange, with retention set to delete then it should have deleted it, right @abhilashshetty04 ? Could we get logs from the csi-controller? Yes should have been deleted by CSI driver.
How I deleted the volume:
I've tried on several pods of mayastor
, all of them have no curl
and apt
, need to try other ways:
user@mbp2023 ~ % kubectl get svc -n mayastor
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mayastor-agent-core ClusterIP 10.128.39.22 <none> 50051/TCP,50052/TCP 6h13m
mayastor-api-rest ClusterIP 10.128.209.239 <none> 8080/TCP,8081/TCP 6h13m
mayastor-etcd ClusterIP 10.128.176.26 <none> 2379/TCP,2380/TCP 6h13m
mayastor-etcd-headless ClusterIP None <none> 2379/TCP,2380/TCP 6h13m
mayastor-loki ClusterIP 10.128.186.128 <none> 3100/TCP 6h13m
mayastor-loki-headless ClusterIP None <none> 3100/TCP 6h13m
mayastor-metrics-exporter-pool ClusterIP 10.128.39.212 <none> 9502/TCP 6h13m
Realized the line of mayastor-api-rest
, tried this:
user@mbp2023 ~ % kubectl port-forward deployment/mayastor-api-rest -n mayastor 8081:8081
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081
Handling connection for 8081
And this:
user@mbp2023 ~ % curl -X 'DELETE' 'http://127.0.0.1:8081/v0/volumes/327b4e55-589c-42e2-baf9-d1bdadf63366' -H 'accept: /'
{"details":"Volume '327b4e55-589c-42e2-baf9-d1bdadf63366' not found","message":"SvcError :: VolumeNotFound","kind":"NotFound"}
user@mbp2023 ~ % curl -X 'DELETE' 'http://127.0.0.1:8081/v0/volumes/0a59e089-5db5-4221-9b53-ecdd854a99ec' -H 'accept: /'
user@mbp2023 ~ %
Bang! It’s done! Thank you @tiagolobocastro
I had exactly the same issue. Could the garbage job be fixed to implement the cleaning of these orphaned volumes or even better, not to delete the entries when Pv is call for deletion until this really happens? When a PV is called for deletion when a existing PVC is still bounded, this get stuck (or should) until the PVC is also deleted.
Anyhow, thank you for all the above-mentioned. Helped me to detect the orphaned volumes and gave me a way to clean up and free the blocked resources.
I guess we need to try and repro this first. So let me double check what the flow is:
Right.
Automatic GC on https://github.com/openebs/mayastor-control-plane/pull/724 WA on current release, please restart the csi-controller pod
Edit:
I tried to delete a pv before deleting the pvc and the
kubectl delete pv
command hung. I stopped it usingctrl-c
and deleted the pvc first and then the pv. This resulted in an orphaned volume and bdev. I then tried to use the io-engine-client to manually delete the bdev but it failed:Expected behavior The block device is destroyed. For some reason this block device was not destroyed by Kubernetes even though the
StorageClass
hadreclaimPolicy: Delete
.OS info (please complete the following information):