vertica / vertica-kubernetes

Operator, container and Helm chart to deploy Vertica in Kubernetes
Apache License 2.0
44 stars 25 forks source link

Add support for stop sandbox #955

Closed roypaulin closed 6 days ago

roypaulin commented 2 weeks ago

This is the first step to support stop sandbox for realdb. Normally, if a pod is down the operator will try to restart it. However, for the realdb use case we want to be able to stop a standbox and to prevent the operator from restarting it until we decide otherwise. This adds the logic to stop_db a sandbox when spec.sandboxes[].shutdown is true, a new reconciler has been added to the vdb controller to wake up the sandbox controller. As long as that field is true, the operator will ignore the sandbox's pods. As soon as it is set to false, the operator will restart the sandbox.

Scaling down the sts and adding webhook rules will be done on a follow-up ticket.

roypaulin commented 1 week ago

Besides this, do we support unsandboxing a sandbox that is stopped? Should we error it out in webhook or we internally restart the sandbox first, then unsandbox it.

No, we are not going to support it. We should restrict what the operator can do when shutdown is true. We will enforce it with a set of webhook rules.

roypaulin commented 1 week ago

@cchen-vertica Given that status.sandboxes[].shutdown is confusing to you, I will remove it. After stop_db on sandbox, I will update spec.subclusters[].shutdown to true(. I will add a separate reconciler(stopsubcluster_reconciler) that will iterate over the subclusters and if spec shutdown is true, will stop the subcluster(if at least one node is up or the subcluster is not in a sandbox with shutdown == true) and will update status.subclusters[].shutdown to true. When spec.sandboxes[].shutdown is set back to false, the operator will startdb the sandbox and update in restartsandbox_reconciler(that I will rename postrestartsubcluster_reconciler) both spec and status shutdown in the subclusters. Do you find this better?

roypaulin commented 1 week ago

I am keeping spec.sandboxes[].shutdown. It provides a user-friendly and easy way to trigger sandbox shutdown. Moreover the primary use case is realdb and it is easier to update a single field so we don't have to look up the subclusters.