scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB
https://operator.docs.scylladb.com/
Apache License 2.0
318 stars 158 forks source link

Support OpenShift #424

Open ezbz opened 3 years ago

ezbz commented 3 years ago
- [ ] https://github.com/scylladb/scylla-operator/issues/1935
- [ ] https://github.com/scylladb/scylla-operator-release/issues/189
- [ ] https://github.com/scylladb/scylla-operator/issues/713
cgruver commented 3 years ago

I have an interim solution here, but it will need some work to conform to the higher security requirements of OpenShift.

  1. The operator needs to run with anyuid privileges because it wants to bind to port 443.
  2. Need to set scyllaclusters/finalizers with update privilege on the cluster member ROLE.
  3. The cluster service account needs to run with the privileged SCC so that it can access SYS_NICE.

You can find a working example here: https://github.com/lab-monkeys/home-library-tutorial under the Scylla folder.

It still needs more work, so I'll update with progress.

cgruver commented 3 years ago

I have ScyllaDB clusters running in OpenShift 4.6.

The Pod security still needs some work so that the Pods don't have to run as privileged containers, but it's a start.

The working example is here:

https://github.com/lab-monkeys/home-library-tutorial/tree/main/Scylla

oc apply -f cert-manager.yaml

oc apply -f operator.yaml
oc apply -f cluster-cql.yaml
oc apply -f cluster-dynamo.yaml

I'm running on a bare-metal cluster, so eventually I should be able to tune it for all of the performance tweaks that ScyllaDB needs.

tnozicka commented 3 years ago

The operator needs to run with anyuid privileges because it wants to bind to port 443.

We shouldn't listen on 443, the service should handle the redirection and we should listen on unprivileged port. Feel free to file a separate issue there, I think we could fix that sooner than we support OCP.

cortopy commented 2 years ago

I don't use Openshift, but I have a hardened cluster with strict security policies. For those that are in the same situation, this was enough for me. There are two lines that are the key:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: scylla
  labels:
    {{- include "scylla-extras.labels" . | nindent 4 }}
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: "docker/default,runtime/default"
    apparmor.security.beta.kubernetes.io/allowedProfileNames: "runtime/default"
    seccomp.security.alpha.kubernetes.io/defaultProfileName: "runtime/default"
    apparmor.security.beta.kubernetes.io/defaultProfileName: "runtime/default"
spec:
  allowPrivilegeEscalation: false
  allowedCapabilities:
    - SYS_NICE # Fix 1
  volumes:
    - "configMap"
    - "emptyDir"
    - "projected"
    - "secret"
    - "downwardAPI"
    - "persistentVolumeClaim"
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: "RunAsAny" # Fix 2
  seLinux:
    rule: "RunAsAny"
  supplementalGroups:
    rule: "MustRunAs"
    ranges:
      - min: 1
        max: 65535
  fsGroup:
    rule: "MustRunAs"
    ranges:
      - min: 1

SYS_NICE (Fix 1) capability is understandable

However, I have to allow scylla pods to run as root (fix number 2), which feels like an overkill and a source of potential vulnerablities

tnozicka commented 2 years ago

I think the root privs are for tuning devices and setting sysctl which are done on scylla startup

gautam-borkar commented 2 years ago

I tried the step mentioned above and assigned privileged role for serviceaccount. But getting this error :- image

Running the command gives below output :-

kubectl -n scylla get pods -l "app.kubernetes.io/name=scylla-incident-mgmt"
No resources found in scylla namespace.
tnozicka commented 1 year ago

Running the command gives below output :-

the label selector seems wrong - it should be "app.kubernetes.io/name=scylla", not "app.kubernetes.io/name=scylla-incident-mgmt"

Some actionable info would be say oc get events --sort-by=.metadata.creationTimestamp or the pod's yaml if it gets created or operator logs if there are failures.

But in this case, as it's a feature, this is primarily waiting for cycles from one of the team members to go and try when we decide to support the platform, I assume we'd be almost certain to hit all those issues as well.

mykaul commented 1 year ago

Depends on https://github.com/scylladb/scylla-operator/issues/713