Closed frouioui closed 5 months ago
In commit bc74ab4, I have applied one of the most important suggestion discussed above which is to remove the BackupTablet
strategy in favor of BackupKeyspace
and BackupCluster
. The strategies can be used as follows:
# BackupKeyspace
strategies:
- name: BackupKeyspace
cluster: "example"
keyspace: "customer"
# BackupCluster
strategies:
- name: BackupCluster
cluster: "example"
Meanwhile, the BackupShard
strategy does not change. When ran we can see the following command line argument in the job's pod, which gets executed upon creation of the container:
# BackupKeyspace
Args:
/bin/sh
-c
/vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80-
# BackupCluster
Args:
/bin/sh
-c
/vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard commerce/- && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80-
cc @maxenglander @mattlord
another thought, might be nice to give users a way to assign
annotations
, and one or more affinity selection options to the backup runner pods. that way they can influence things scheduling and eviction.for example, users might not want backup runner pods running on the same nodes as vttablet pods. and they might not want the backup runner pods to get evicted by an unrelated pod after they've been running for a long time.
In e6946fb I have added affinity and annotations in the VitessBackupScheduleTemplate
, allowing the user to configure the affinity and annotations they want for their pods that take backups.
Description
This Pull Request adds a new CRD called
VitessBackupSchedule
. Its main goal is to automate and schedule backups of Vitess, taking backups of the Vitess cluster at regular intervals based on a given cronschedule
andStrategy
. This new CRD is managed by theVitessCluster
, like most other components of the vitess-operator, theVitessCluster
controller is responsible for the whole lifecycle (creation, update, deletion) of theVitessBackupSchedule
object in the cluster. Inside theVitessCluster
it is possible to define severalVitessBackupSchedule
s as a list, allowing for multiple concurrent backup schedules.Among other things, the
VitessBackupSchedule
object is responsible for creating Kubernetes's Job at the desired time, based on the user-definedschedule
. It also keeps track of older jobs and delete them if they are too old, according to user-defined parameters (successfulJobsHistoryLimit
&failedJobsHistoryLimit
). The jobs created by theVitessBackupSchedule
object will use thevtctld
Docker Image and will execute a shell command that is generated based on the user-definedstrategies
. The end user can define as many backup strategy per schedule, each of them mocks whatvtctldclient
is able to do, theBackup
andBackupShard
commands are available, a map of extra flags enable the user to give as many flag as they want tovtctldclient
.A new end-to-end test is added to our BuildKite pipeline as part of this Pull Request to test the proper behavior of this new CRD.
Related PRs
Demonstration
For this demonstration I have setup a Vitess cluster by following the steps in the getting started guide, until the very last step where we must apply the
306_down_shard_0.yaml
file. My cluster is then composed of 2 keyspaces:customer
with 2 shards, andcommerce
unsharded. I then modify the306...
yaml file to contain the new backup schedule, as seen in the snippet right below. We want to create two schedules, one for each keyspace. The keyspacecustomer
will have two backup strategies: one for each shard.Once the cluster is stable, all tablets are serving and ready, I re-apply my yaml file with the backup configuration:
Immidiately I can check that the new
VitessBackupSchedule
objects have been created.Now I want to check the pods where the jobs created by
VitessBackupSchedule
are running. After about 2 minutes, we can see four pods, two for each schedule. The pods are marked asCompleted
as they finished their job.Now let's check our backup: