Open bernardgut opened 2 months ago
BTW this is not a trivial problem to solve as you have to delete the storage claimed before you delete the CSI pods.
The only way to do this "trivially" that I can see on top of my head would be to have a script that does the following when you run omnictl delete cluster ...
Same for talosctl reset
... whatever you have planned, I think it needs to run the finalizers on the PVCs that are assigned to the node before deleting the node:
kubectl delete node ...
something along those lines, IMO.
I wouldn't say that's a bug. It looks like a feature to me. As it needs some discussion before can implement that.
Another approach you can use here is to have a sidecar container for Omni that could watch Omni state, check which clusters no longer exist and clean up the dangling volumes in the CSI provisioner.
What you describe is too specific thing, Talos doesn't have means to do that.
Another more generic approach for that can be some hook that runs inside your cluster that can be called by Omni, then wait until this hook completes, and then destroy the cluster. But the hook implementation will be on you side again.
I like the "onClusterDeletion" hook approach. I am not an expert but it makes sense to me.
I think the other approach to this you mentioned in another thread which was "Mark some drives to be "owned" by omni/talos and wipe them entirely when running omnictl delete
" was better. That would remove the need for hooks mechanics partially (for the general use-case). It is also easier to set up and makes a lot of sense: You have a declarative machine configuration in which you say This is the machine on which this OS will run, this is its network config, this is the kernel config, these are the extensions, and these are the block devices that will be used along with their config (if any). Everything declared here is immutable
. This to me makes a whole lot of sense.
Is there an existing issue for this?
Current Behavior
omnictl sync ...
omnictl delete cluster ...
node-shell
into the machine and go check what is on the disks you will see the previous attempts PVs data is still there and your storage space being wasted.Expected Behavior
I am not sure if this is a feature or a bug so I opened a ticket anyway. If this is a feature please add a
--run-storage-finalizers
flag toomnictl delete cluster
such that when it deletes the cluster PVs, it frees the space from the storage layer. We should avoid dangling PV data remaining on the underlying storage forever.Steps To Reproduce
See Description above
What browsers are you seeing the problem on?
No response
Anything else?
Talos 1.7.1 Omni 0.34.0