Closed roypaulin closed 6 days ago
Besides this, do we support unsandboxing a sandbox that is stopped? Should we error it out in webhook or we internally restart the sandbox first, then unsandbox it.
No, we are not going to support it. We should restrict what the operator can do when shutdown
is true. We will enforce it with a set of webhook rules.
@cchen-vertica Given that status.sandboxes[].shutdown
is confusing to you, I will remove it. After stop_db on sandbox, I will update spec.subclusters[].shutdown
to true(. I will add a separate reconciler(stopsubcluster_reconciler) that will iterate over the subclusters and if spec shutdown is true, will stop the subcluster(if at least one node is up or the subcluster is not in a sandbox with shutdown == true) and will update status.subclusters[].shutdown
to true.
When spec.sandboxes[].shutdown
is set back to false, the operator will startdb the sandbox and update in restartsandbox_reconciler(that I will rename postrestartsubcluster_reconciler) both spec and status shutdown in the subclusters.
Do you find this better?
I am keeping spec.sandboxes[].shutdown
. It provides a user-friendly and easy way to trigger sandbox shutdown. Moreover the primary use case is realdb and it is easier to update a single field so we don't have to look up the subclusters.
This is the first step to support stop sandbox for realdb. Normally, if a pod is down the operator will try to restart it. However, for the realdb use case we want to be able to stop a standbox and to prevent the operator from restarting it until we decide otherwise. This adds the logic to stop_db a sandbox when
spec.sandboxes[].shutdown
is true, a new reconciler has been added to the vdb controller to wake up the sandbox controller. As long as that field is true, the operator will ignore the sandbox's pods. As soon as it is set to false, the operator will restart the sandbox.Scaling down the sts and adding webhook rules will be done on a follow-up ticket.