stashed / stash

🛅 Backup your Kubernetes Stateful Applications
https://stash.run
Other
1.29k stars 86 forks source link

Proposal: Imeplement Restic TPR Resource for Kubernetes #1

Closed tamalsaha closed 7 years ago

tamalsaha commented 7 years ago

Motivation:

If you are running production workloads in Kubernetes, you want to take backup of your disks. Traditional tools like Bacula are too complex to setup and maintain manually in a dynamic compute environment like Kubernetes. So, we plan to implement a TPR controller based on restic to address these issues.

Goals:

Non-goals:

Why Restic

Design:

Taking Backups:

To take a backup 4 piece of information is required:

This a how a TPR controller can implement backups:

I think the label based option has some advantages:

Once TPR controller finds RC etc that has enabled bckup, it will add a sidecar container with restic. Currently there is no way to add sidecard container to a running pod. So, restic will restart the pods for the first time. This is not ideal but this the best we can do at this time.

If the RC and TPR assocation is later removed, TPR controller will also remove the side car container.

Scheuling backups:

I think we can just use cron to run backups on a schedule inside the docker container. This sounds simple enough.

Entrypoint:

Since restic process will be run on a scheule, some process will be needed to be running as the entrypoint. This will be a kloader type process that watches restic TPR and translates that into the restic compatiable config. eg,

Essentially, we can create a new restik (with k) cli with 3 commands:

Supported Kubernetes Objects:

We should any Kubernetes object types that have PodTemplate in their schema. These are:

Since new side car containers can't added once a Pod is created, Pod types will not be supported.

Backup Nodes

Users might be interested in take backup of host paths. This can be done by deploying a DaemonSet with a do nothing busybox container. Restic TPR controller can use that as a vessel for running restic side card containers.

Restarting pods

As mentioned before, first time side car containers are added, pods will be restarted by controller. Who performs the restart will be done on a case-by-case basis. For example, Kubernetes itself will restarts pods behind a deployment. In such cases, TPR controller will let Kubernetes do that.

Implementers should be aware of the controller-ref concept to properly identify the pods managed by a RC, etc.

This situation can be improved once Kubernetes adds support for extensible admission controller. Restic TPRC can become an admission controller and modify the PodTemplate accordingly. This is avoid the first restart issue. This will also allow us to support Pods directly. See tracking bug #5.

Repo initialization

For initial implementation, users will be expected to initialize the repository. We can revisit this later. See tracking bug #6 .

HTTP api

An HTTP api server can be implemented to expose readonly data for repositories. This can be used to build web Dashboards. This is outside the scope for initial implementation. Tracking bug #7.

Recovery process

The recover process will be left to users for now. Later we can think about automation.

CLI

A cli can be created that can simplify repo initialization, recover process for a Kubernetes deployment. This is outside the scope for initial implementation. Also, note that we are building a cli called osm for managing various cloud bucket operations. Tracking bug #8

sanmai commented 7 years ago

Does restic have restic.conf?

tamalsaha commented 7 years ago

@sanmai , I meant restic.conf as a "placeholder'. We need some way take the config from Kubernetes and forward that to restic cli.

tamalsaha commented 7 years ago

There are a number of open issues regarding the TPR spec:

tamalsaha commented 7 years ago
retention_policy:
    strategy: KEEY_HOURLY
    snapshot_count: 5
    retain_hostname: myhost
    retain_tag: ["abc", "xyz"]
tamalsaha commented 7 years ago

Update operation will support work like below:

tamalsaha commented 7 years ago

@sadlil @saumanbiswas we need to discuss the secret format. The secret will have the following things:

We should also decide, whether to mount the secret / set ENV vars (seems the format restic likes) vs reading via kube api in side car container. Though go exec processes don't see the env variables in the parent process. Mount / ENV variable will mean that we can only use secrets from the same namespace. In that case we can call it repositorySecret instead of repositorySecretName.

tamalsaha commented 7 years ago

It's implemented now!