rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
866 stars 271 forks source link

Failed to create the Pod #488

Closed karthikkb closed 3 years ago

karthikkb commented 3 years ago

I followed the documentation https://www.rabbitmq.com/kubernetes/operator/install-operator.html#openshift to install RabbitMQ on Openshift Cluster v4.3. I have completed all 5 steps as per the documentation but still the operator is not installed. Below is the output.

$ oc get all -n rabbitmq-system
NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rabbitmq-cluster-operator   0/1     0            0           7m39s

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/rabbitmq-cluster-operator-79888fd8c8   1         0         0       7m39s

Output of "kubectl describe replicasets.apps" gives below error.

Events:
  Type     Reason        Age                 From                   Message
  ----     ------        ----                ----                   -------
  Warning  FailedCreate  74s (x107 over 9h)  replicaset-controller  Error creating: pods "rabbitmq-cluster-operator-79888fd8c8-" is forbidden: unable to validate against any security context constraint: []

Any Idea why this is failing?

ChunyiLyu commented 3 years ago

@karthikkb Sad that there is no message provided in []. It's hard to debug without any details. Since it's been a while, would you mind trying it again and see whether we can get a detailed error message? In addition, could you inspect your created Security Context Constraints object: oc describe scc <constraint-name> and post it here?

tsia commented 3 years ago

i am facing the same issue. also with no more info in [].

# oc get all -n rabbitmq-system
NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rabbitmq-cluster-operator   0/1     0            0           25m

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/rabbitmq-cluster-operator-5bf48dbd67   1         0         0       7m22s
# oc describe replicasets.apps
...
Events:
  Type     Reason        Age                     From                   Message
  ----     ------        ----                    ----                   -------
  Warning  FailedCreate  2m30s (x18 over 7m58s)  replicaset-controller  Error creating: pods "rabbitmq-cluster-operator-5bf48dbd67-" is forbidden: unable to validate against any security context constraint: []
# oc describe scc rabbitmq-cluster
Name:                                           rabbitmq-cluster
Priority:                                       <none>
Access:
  Users:                                        <none>
  Groups:                                       <none>
Settings:
  Allow Privileged:                             false
  Allow Privilege Escalation:                   true
  Default Add Capabilities:                     <none>
  Required Drop Capabilities:                   ALL
  Allowed Capabilities:                         FOWNER,CHOWN
  Allowed Seccomp Profiles:                     <none>
  Allowed Volume Types:                         configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
  Allowed Flexvolumes:                          <all>
  Allowed Unsafe Sysctls:                       <none>
  Forbidden Sysctls:                            <none>
  Allow Host Network:                           false
  Allow Host Ports:                             false
  Allow Host PID:                               false
  Allow Host IPC:                               false
  Read Only Root Filesystem:                    false
  Run As User Strategy: MustRunAsRange
    UID:                                        <none>
    UID Range Min:                              <none>
    UID Range Max:                              <none>
  SELinux Context Strategy: MustRunAs
    User:                                       <none>
    Role:                                       <none>
    Type:                                       <none>
    Level:                                      <none>
  FSGroup Strategy: MustRunAs
    Ranges:                                     <none>
  Supplemental Groups Strategy: RunAsAny
    Ranges:                                     <none>
ChunyiLyu commented 3 years ago

Hi @tsia, thanks for providing details on this. I haven't tested an Openshift environment yet. However, looking at your provided security context, I think a capability is missing from the Allowed Capabilities section. It needs CHOWN, FOWNER, and DAC_OVERRIDE.

Could you please update the security context and let me know if that fix your problem? I will update our docs if this is the problem. Thanks!

tsia commented 3 years ago

Hi @ChunyiLyu, i updated the SCC:

# oc describe scc rabbitmq-cluster
Name:                                           rabbitmq-cluster
Settings:
  Allowed Capabilities:                         FOWNER,CHOWN,DAC_OVERRIDE
...

but i still get the same errors:

# oc get events --watch -n rabbitmq-system
LAST SEEN   TYPE      REASON              OBJECT                                            MESSAGE
4s          Warning   FailedCreate        replicaset/rabbitmq-cluster-operator-5bf48dbd67   Error creating: pods "rabbitmq-cluster-operator-5bf48dbd67-" is forbidden: unable to validate against any security context constraint: []
tsia commented 3 years ago

i did some more research. when i set seLinuxContext to RunAsAny the pod is created successfully.

ChunyiLyu commented 3 years ago

@tsia could you provide the updated full security context for reference? In addition, which security account did you associate the security context to?

This is a bit strange. Tn our example we are trying to allow a range of user ID by setting annotations in the namespace, and we were able to verify it working in an Openshift environment. Do you know any recent changes that could impact this?

tsia commented 3 years ago

@ChunyiLyu the SCC currently looks like this:

kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
  name: rabbitmq-cluster
allowPrivilegedContainer: false
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: RunAsAny
fsGroup:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
requiredDropCapabilities:
  - "ALL"
allowedCapabilities:
  - "FOWNER"
  - "CHOWN"
volumes:
  - "configMap"
  - "secret"
  - "persistentVolumeClaim"
  - "downwardAPI"
  - "emptyDir"
  - "projected"

just to be sure i double-checked that i have the right annotations in the rabbitmq-system namespace because i remembered that i had them wrong before but they are correct:

openshift.io/sa.scc.supplemental-groups: 1000/1
openshift.io/sa.scc.uid-range: 1000/1

i'm not aware of any changes but i'm also pretty new to the whole OpenShift thing.

ChunyiLyu commented 3 years ago

@tsia thanks for contributing and helping me debug this 😄

I have a suspicion regarding seLinuxContext. From the Openshift doc:An SELinuxContext strategy of MustRunAs with no level set. Admission looks for the openshift.io/sa.scc.mcs annotation to populate the level. (doc link)

In the SCC we document, we set seLinuxContext to MustRunAs, but no annotation openshift.io/sa.scc.mcs set on namespace. When no annotation is provided in the namespace itself, the default would come from the project as stated in here

I think that explains how it was working on our Openshift testing environment but not yours. We probably have different default for seLinuxContext in our project.

tsia commented 3 years ago

What OpenShift Version were you using? Maybe some Defaults changed from 3 to 4?

do you know what i should set in the annotation openshift.io/sa.scc.mcs?

tsia commented 3 years ago

i just set openshift.io/sa.scc.mcs: 's0:c26,c5' (i copied that from the namespace where the rabbitmq cluster will be running) on the rabbitmq-system namespace and reset seLinuxContext to MustRunAs and now it seems to work.

it seems like a oc new-project rabbitmq-system would have created the namespace with the correct sa.scc.mcs annotation.

ChunyiLyu commented 3 years ago

@tsia lovely 👍 Thanks again for looking into this and sharing the knowledge! It's hard for us to catch things like this because we don't have easy access to Openshift environment. I'm going to update our docs.

maykelg commented 3 years ago

@tsia lovely +1 Thanks again for looking into this and sharing the knowledge! It's hard for us to catch things like this because we don't have easy access to Openshift environment. I'm going to update our docs.

OKD is the (freely available) upstream Kubernetes distribution embedded in Red Hat OpenShift. It may be of help in duplicating issues.