k8up-io / k8up

Kubernetes and OpenShift Backup Operator
https://k8up.io/
Apache License 2.0
601 stars 62 forks source link

Reusable ResticRepository spec #580

Open ccremer opened 2 years ago

ccremer commented 2 years ago

Summary

As K8up user \ I want to define Restic repository configuration in its own CRD \ So that I can reuse and reference the configuration in various backups and schedules

Context

As part of the next K8up roadmap, we want to externalize the Restic repository settings from Schedules and be able to refer to them. this allows users to reuse existing Restic repository configurations.

Out of Scope

No response

Further links

No response

Acceptance Criteria

Given a `k8up.io/v2/Schedule` spec
When I refer a `k8up.io/v2/ResticRepository` spec
Then K8up can spawn backups using the configuration provided in the `ResticRepository` spec.
Given a `k8up.io/v2/ResticRepository` spec
When I need to customize the Restic repository maintenance settings
Then I can specify Prune and Check schedules in the `k8up.io/v2/ResticRepository` spec
Given an empty `k8up.io/v2/ResticRepository` spec
When I only configure the Restic repository and their backend settings
Then K8up automatically configures weekly schedules for Restic Prunes and Checks
Given a `k8up.io/v2/Schedule` spec
When I want to use a globally configured Restic repository
Then I can also refer to `k8up.io/v2/ClusterResticRepository` that supports the same spec as `ResticRepository`

Implementation Ideas

No response

tobru commented 2 years ago

We should also add some use cases around scoping of this object. A ResticRepository lives in a namespace and can be referenced via specifying in which namespace it is. And we should also have ClusterResticRepository so that we can have cluster-wide repositories defined.

Kidswiss commented 2 years ago

@tobru not sure if having namespaced ResticRepository that can be referenced from other namespaces makes sense. A tenant that only has namespace a could potentially reference the repo of a tenant in namespace b. How should we be able to control that?

EDIT: words hard

ccremer commented 2 years ago

I agree with @Kidswiss here. Cross-namespace references can be dangerous, impossible even. I mean the repository references secrets then these secrets cannot be mounted from the other namespace anyway.

@Kidswiss AFAIK if the service account that is executed as a Cronjob cannot access the other namespace, this would simply prohibited by default RBAC rules. Still I'm pretty sure that Secrets cannot be mounted as Volumes cross-namespace.

ccremer commented 2 years ago

If a cluster-scoped ResticRepository references Secrets, the Secrets have to be in a certain namespace or at least in the one where the spec is being referenced from. If the Secret is in another namespace then RBAC rules for the SA executing the backup are required to read the contents of a Secret. Still, the secret cannot be mounted (in case they are files)

Kidswiss commented 2 years ago

@ccremer an option would be that the operator injects the secrets. It's basically the same that we currently do with the global repo configs.

ccremer commented 2 years ago

Ok, so I think I have an idea or two.

In my mind k8up trigger backup --namespace my-app --from-schedule creates a Job with a Pod spec that in turn runs k8up restic ... with a service account.

Idea 1) we could say that whoever is able to create a ClusterResticRepository accepts the fact that the secret referenced in the spec gets copied to every namespace where that cluster spec is referenced from. Then the cloned Secrets can be used for native mounting. This might be a less-obvious effect that the cluster schedule maintainer didn't think of.

Idea 2) we leverage RBAC so that explicit Read access needs to be given to the ServiceAccount that wants to use a cluster schedule. That way we enable/delegate access control to the cluster schedule maintainer. But the contents of the Secret in another namespace can only be retrieved via API in code (which is trivial). The current K8up image has a Kubernetes client built-in, so it could just retrieve the Secret on-demand. That way, we don't need to mount secrets or configmaps (we could create files in /tmp if necessary). If a cluster schedule maintainer wants to open up the Secret access, they could just create a ClusterRolebinding and add it to the default rolebindings.

Personally, I prefer option 2, less nontransparent magic involved while giving flexibility to the users