APP.4.4.A5 - Githubissues

sluetze commented 9 months ago

A cluster MUST have a backup. The backup MUST include: • Persistent volumes • Configuration files for Kubernetes and the other programs of the control plane • The current state of the Kubernetes cluster, including extensions • Databases of the configuration (namely etcd in this case) • All infrastructure applications required to operate the cluster and the services within it • The data storage of the code and image registries
rules:
- rule which checks if velero or APIs of another backuptool are installed (for application backup?)
- rule which checks if quay is installed and then if there is a backup scheduled
- rule which checks for etcdbackup (see: https://docs.openshift.com/container-platform/4.14/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html)
Snapshots for the operation of the applications SHOULD also be considered. Snapshots MUST NOT be considered a substitute for backups.
rules:
- rule which checks if snapshots are available (and usable)

rules: (there are no rules currently existing, says grep)

rule which checks for the last etcd backup (if possible)
rule which checks if velero or APIs of another backuptool are installed (for application backup?)
rule which checks if snapshots are available (and usable)

sluetze commented 7 months ago

I created cee59e48091374e3edc6d39fac104ba79650d41d which allowes for automatic checks on etcdbackup using the etcdbackup openshift feature.

I do not want to propose this upstream currently for the following reasons:

etcdbackup is behind a FeatureGate and is TechPreviewOnly. By using this APIs a customer would be locked on its current openshift version.
it is not supported at the moment and not production ready
the remediation requires multiple permissions (create PVC in openshift-etcd namespace, create backups at cluster scope)

for now I will start with a manual rule for etcd backup

sluetze commented 7 months ago

Snapshots for the operation of the applications SHOULD also be considered. Snapshots MUST NOT be considered a substitute for backups.

some thoughts on a automatic rule for this.

We could check for an existing VolumeSnapshotClass. This would ensure, that a "target" for snapshots is configured. But I can create this object without having an effect, for example since my CSI Driver does not support snapshots. Currently I have not found any datafield which shows the capabilities of a CSI Driver
We could check, if VolumeSnapshotsContents are existing. By doing so, we would ensure that a) the Driver supports this, B) it is also configured to be usable. But we would not only expect, that they are configured but also be "false-positive", until someone USES that feature.
We could only check for the CRDs to be existing. But this would be PASS by default and neglect the necessity to configure snapshotting and to use a snapshot compatible Driver

Opinions?

@nrrso @benruland @ermeratos

sluetze commented 7 months ago

A cluster MUST have a backup. The backup MUST include: • Persistent volumes • Configuration files for Kubernetes and the other programs of the control plane • The current state of the Kubernetes cluster, including extensions • Databases of the configuration (namely etcd in this case) • All infrastructure applications required to operate the cluster and the services within it • The data storage of the code and image registries

using OADP

Check if DataProtectionApplication.oadp.openshift.io/v1alpha1 has .[spec.configuration.velero.defaultPlugins](https://pkg.go.dev/github.com/openshift/oadp-operator/api/v1alpha1#DefaultPlugin) set to at least openshift,csi we should make this configurable, in cases when there is a ROSA Cluster or anything else which requires other / additional plugins.

furthermore we would need to check for an backuplocation

using kasten

We could check, if there is at least one policies.config.kio.kasten.io which is valid (.status.validation == Success), since this would mean, there is a valid backuppolicy.

Problems

Both approaches have several shortcomings.

compliance-operator would need permission to access the APIs. this must be on a cluster scope, since we can't guarantee that users will stick with default namespace names.
We can't use namespaced rolebindings, since this would require compliance-operator to be installed afterwards and also require the operator to have access to this namespaces to create the rolebinding. This results in a lot of permission for the compliance-operator
we would need two different rules, and subsequently more rules for additional backup solutions. This would also mean, that there are multiple rules failing everytime.
Both approaches wont allow a automatic remediation

Agnostic approach

The agnostic approach would be to just check, if the CRDs are available. This would not ensure a compliant configuration, but would fail per default and push the administrator to at least install a solution (and hopefully configure it). this can be easily extended by adding values to a variable. The compliance operator already has the permissions to list the CRDs so this wont be an issue

sluetze commented 7 months ago

rule which checks if quay is installed and then if there is a backup scheduled

after reading into the backup docs, i do not find that useful or even checkable, since the backup mostly is commandline-juggling.

Furthermore the compliance-operator can not really stretch to components, which are not on OCP. Quay (or another registry) might be in one OCP Cluster but not in all of them. So this check would mostly be not useful or a false positive

sluetze commented 3 months ago

merged upstream with https://github.com/ComplianceAsCode/content/pull/11717

sig-bsi-grundschutz / content

APP.4.4.A5 #31

using OADP

using kasten

Problems

Agnostic approach