openshift / cluster-etcd-operator

Operator to manage the lifecycle of the etcd members of an OpenShift cluster
Apache License 2.0
96 stars 130 forks source link

Security: no need to set privileged: true option #1183

Closed lance5890 closed 4 months ago

lance5890 commented 10 months ago

no need to set privileged:true option

related to https://github.com/openshift/cluster-etcd-operator/issues/1181

openshift-ci[bot] commented 10 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lance5890 Once this PR has been reviewed and has the lgtm label, please assign hasbro17 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/cluster-etcd-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 10 months ago

Hi @lance5890. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tjungblu commented 10 months ago

/ok-to-test

/hold

lance5890 commented 10 months ago

/retest

lance5890 commented 10 months ago

/retest

lance5890 commented 10 months ago

@tjungblu I think the failed e2e case has nothing to do with with PR

  1. By default pods that run as root will have write access to the file system exposed by hostPath

  2. I have also checked the crio etcd container config in "/var/lib/containers/storage/overlay-containers/xxxx/userdata/config.json", it show as follows without privileged: true option:

maybe we just need read and write option to the /var/lib/etcd dir , compared the etcd container with privileged: true , the main diffrences are in the capabilities option

             "user": {
                        "uid": 0,
                        "gid": 0,
                        "additionalGids": [
                                0
                        ]
                },
               "capabilities": {
                        "bounding": [
                                "CAP_CHOWN",
                                "CAP_DAC_OVERRIDE",
                                "CAP_FSETID",
                                "CAP_FOWNER",
                                "CAP_SETGID",
                                "CAP_SETUID",
                                "CAP_SETPCAP",
                                "CAP_NET_BIND_SERVICE",
                                "CAP_KILL"
                        ],
                        "effective": [
                                "CAP_CHOWN",
                                "CAP_DAC_OVERRIDE",
                                "CAP_FSETID",
                                "CAP_FOWNER",
                                "CAP_SETGID",
                                "CAP_SETUID",
                                "CAP_SETPCAP",
                                "CAP_NET_BIND_SERVICE",
                                "CAP_KILL"
                        ],
                        "permitted": [
                                "CAP_CHOWN",
                                "CAP_DAC_OVERRIDE",
                                "CAP_FSETID",
                                "CAP_FOWNER",
                                "CAP_SETGID",
                                "CAP_SETUID",
                                "CAP_SETPCAP",
                                "CAP_NET_BIND_SERVICE",
                                "CAP_KILL"
                        ]
                },

....
                {
                        "destination": "/var/lib/etcd/",
                        "type": "bind",
                        "source": "/var/lib/etcd",
                        "options": [
                                "rw",
                                "rbind",
                                "rprivate",
                                "bind"
                        ]
                }
tjungblu commented 10 months ago

sorry @lance5890 I would need a bit more time to dig this up. I'm sure there's something else that would inject the priv/root=0 user into the pod somehow. It might also be the static pod machinery in library-go.

I doubt we could access the hostNetwork nor write to the hostPath without the root privileges.

lance5890 commented 10 months ago

/retest

lance5890 commented 10 months ago

/retest

openshift-ci[bot] commented 10 months ago

@lance5890: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-qe-no-capabilities 02f0c50efbab6e09b93a8c015b59e7fab0fff5ee link true /test e2e-gcp-qe-no-capabilities

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
lance5890 commented 9 months ago

prow/e2e-gcp-qe-no-capabilities is being made optional

https://github.com/openshift/release/pull/47252 https://github.com/openshift/release/pull/47890

lance5890 commented 9 months ago

I feel there is something to do with the selinux :container_var_lib_t of /var/lib/etcd dir , I will dig this up later

ls -lZ /var/lib

system_u:object_r:container_var_lib_t:s0        20 Dec  4 08:00 etcd
tjungblu commented 9 months ago

@lance5890 while working on certificates right now, I stumbled upon this annotation: https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/bootkube/manifests/00_openshift-etcd-ns.yaml#L11

Maybe that's causing it to effectively run privileged.

lance5890 commented 9 months ago

@lance5890 while working on certificates right now, I stumbled upon this annotation: https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/bootkube/manifests/00_openshift-etcd-ns.yaml#L11

Maybe that's causing it to effectively run privileged.

I have a general idea about this issue, please correct me if I'm wrong

  1. yes ,the openshift.io/run-level: "0" label will skip the scc setting for the pod in ccos-etcd, but this will not cause it to effectively run privileged.
  2. if we do not set etcd run as privileged, why should the etcd still can read/write for the /var/lib/etcd dir

image

  1. we can also set the selinuxOptions,type to spt_t, this will work, as :
    securityContext:
    seLinuxOptions:
      type: "spc_t"

/cc @haircommander

openshift-bot commented 6 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-merge-robot commented 6 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-bot commented 5 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

lance5890 commented 4 months ago

/close