signadot / community

Issue tracker for Signadot
2 stars 1 forks source link

[bug] Signadot CRDs left in the cluster after sandbox TTL has passed or was deleted #48

Open gamino-brex opened 9 months ago

gamino-brex commented 9 months ago

Describe the bug Sandbox and RouteConfig resources with months old were not cleaned up from deleted sandboxes (either deleted or expired post-TTL). Resources can be found with kubectl but the sandboxes do not show up in the Dashboard or CLI.

To Reproduce

Expected behavior If a sandbox has expired or was deleted, we should not have any related orphan resources hanging around our K8s cluster

Observed behavior Stale resources in our cluster

Additional context More details in the shared Slack channel

scott-cotton commented 9 months ago

I agree we should not be leaving around Signadot k8s resources.

The only known circumstances in which we can leave k8s resources behind are

  1. Running cluster delete from the dashboard (this gives a warning about leaving resources around); and
  2. changing the cluster agent token to a token for a different cluster as identified in our dashboard/cli

As in general we have in place guards which prevent sandbox deletion on our SaaS unless the corresponding cluster has confirmed deletion.

So, it would be good to get more information here. Do you think that these orphaned k8s resources could be the result of one of the 2 circumstances above? If not, could you provide some more info, like how often this occurs, perhaps the metadata of some orphaned Signadot k8s resources?

If, on the other hand the 2 circumstances above do seem to explain your orphaned Signadot k8s resources, would you prefer that we delete them anyway?

(To clean up anything manually, it would involve removing finalizers with kubectl prior to deletion, or alternatively, creating a sandbox with name .Spec.Name of the orphaned SDS resources and deleting it from our saas)