kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.77k stars 717 forks source link

Document kubeadm usage with SELinux #279

Closed luxas closed 4 years ago

luxas commented 7 years ago

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

COMMUNITY REQUEST

Versions

All

We need e2e tests that ensure kubeadm works with SELinux on CentOS/Fedora (https://github.com/kubernetes/kubeadm/issues/215) and CoreOS (https://github.com/kubernetes/kubeadm/issues/269)

We might be able to add a job for it on kubernetes-anywhere @pipejakob ?

IIUC kubeadm is broken with SELinux enabled right now. The problem is that we don't have one (AFAIK) very experienced with SELinux in the kubeadm team (at least nobody has had time to look into it yet)

AFAIK, the problem is often when mounting hostPath volumes...

To get closer to production readiness, we should fix this and add a testing suite for it. We should also work with CNI network providers to make sure they adopt the right SELinux policies as well.

Anyone want to take ownership here? I'm not very experienced with SELinux, so I'm probably gonna focus on other things.

@dgoodwin @aaronlevy @coeki @rhatdan @philips @bboreham @mikedanese @pipejakob

luxas commented 7 years ago

@pipejakob I added the kind/postmortem label as it's in the same theme, we broke SELinux users again without noticing it...

rhatdan commented 7 years ago

I don't work with kubadmin but would be very willing to help whoever takes this on.

luxas commented 7 years ago

@rhatdan Great! What I'm looking for is persons that are familiar with SELinux and willing to help. I might be able to coordinate the work though.

A rough todo list would look like:

@rhatdan Let's first try and get it working in v1.7, can be done in #215

roberthbailey commented 7 years ago

@timothysc

coeki commented 7 years ago

I will take for now, since I raised it. I'll have some updates soon, @rhatdan, please advise me ;)

timothysc commented 7 years ago

@luxas , @jasonbrooks - does this still exist in fedora?

I think folks have patched policies on other channels.

/cc @eparis

jasonbrooks commented 7 years ago

@timothysc I haven't tried w/ 1.7 yet, but w/ 1.6, CentOS worked w/ selinux but Fedora 25 didn't. I'll test w/ 1.7

jasonbrooks commented 7 years ago

for reference, I just ran kubeadm 1.7 on f26 in permissive mode, and these are the denials I got:

[root@fedora-1 ~]# ausearch -m avc -ts recent
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:321): avc:  denied  { read } for  pid=2885 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:322): avc:  denied  { open } for  pid=2885 comm="kube-apiserver" path="/etc/kubernetes/pki/apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:331): avc:  denied  { read } for  pid=2945 comm="kube-controller" name="sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:332): avc:  denied  { open } for  pid=2945 comm="kube-controller" path="/etc/kubernetes/pki/sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1

On CentOS 7, same thing, no denials.

rhatdan commented 7 years ago

You are volume mounting in content from the host into a container. If you want an SELinux confined process inside the container to be able to read the content, it has to have an SELinux label that the container is allowed to read.

Mounting the object with :Z or :z would fix the issue. Note either of these would allow the container to write these objects. If you want to allow the container to read without writing then you could change the content on the host to something like container_share_t.

luxas commented 7 years ago

https://github.com/kubernetes/kubernetes/pull/48607 will also help here as it starts making mounting everything but etcd read-only...

timothysc commented 7 years ago

@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?

luxas commented 7 years ago

To me it's unclear which policies kubeadm should add to:

that is working on CentOS, Fedora and CoreOS

On 12 Jul 2017, at 16:57, Timothy St. Clair notifications@github.com wrote:

@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jasonbrooks commented 7 years ago

@rhatdan It looks like :Z is only used if the pod provides an selinux label. In my initial tests, container_runtime_t seems to work -- would that be an appropriate label? And then, I'm assuming in a system w/o selinux, this would just be ignored?

rhatdan commented 7 years ago

Yes it will be ignored by non SELinux systems. RUnning an app as container_runtime_t, basically provides no SELinux confinement, since it is supposed to be the label of container runtimes like docker and CRI-O. If you are running the kublet as this, that is probably fairly accurate.

jasonbrooks commented 7 years ago

Right now, we're running the etcd container as spc_t -- would it be better to run that one as container_runtime_t too?

jasonbrooks commented 7 years ago

It looks like this does it:

diff --git a/cmd/kubeadm/app/master/manifests.go b/cmd/kubeadm/app/master/manifests.go
index 55fe560c46..228f935cdd 100644
--- a/cmd/kubeadm/app/master/manifests.go
+++ b/cmd/kubeadm/app/master/manifests.go
@@ -96,6 +96,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(int(cfg.API.BindPort), "/healthz", api.URISchemeHTTPS),
                        Resources:     componentResources("250m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeControllerManager: componentPod(api.Container{
                        Name:          kubeControllerManager,
@@ -105,6 +106,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(10252, "/healthz", api.URISchemeHTTP),
                        Resources:     componentResources("200m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeScheduler: componentPod(api.Container{
                        Name:          kubeScheduler,

Would this be something to submit as PRs to the 1.7 branch and to master, or just to master? The source moved around a bit in master, the patch above is to the 1.7 branch.

rhatdan commented 7 years ago

I would actually prefer that it run as spc_t, or as a confined domain(container_t). etcd should be easily be able to be confined by SELinux.

jasonbrooks commented 7 years ago

I think spc_t should work. I tried w/ container_t and that didn't work. audit2allow says it needs:

allow container_t cert_t:file { open read };

rhatdan commented 7 years ago

Could we relabel the certs directory with container_file_t or container_share_t. then it would work.

jasonbrooks commented 7 years ago

kubeadm creates an /etc/kubernetes/pki dir when you run kubeadm init, but when you kubeadm reset, it only empties that dir. If we created the pki dir when the rpm is installed, we could do the labeling at that point, by modding the spec file.

jasonbrooks commented 7 years ago

For etcd, the container would need allow container_t container_var_lib_t:file { create lock open read unlink write }; for /var/lib/etcd on the host.

jasonbrooks commented 7 years ago

I'm trying to figure out if it's legitimate to chcon directories in the rpm spec file -- I see many instances of it in github (https://github.com/search?l=&p=1&q=chcon+extension%3Aspec) but I can't tell whether that's considered good packaging practice or not. We could either change kubeam to run the components as spc_t, unconfined, or we could leave kubeadm alone and chcon the pki dir.

rhatdan commented 7 years ago

I am not as familiar with the Kubernetes Architecture, as I should be. But are we talking about different containers or the same container. kubeadmin container versus etcd container? The management container which can launch other containers as "priviliged" should be running as spc_t, since confining it buys us nothing. A service that just listens on the network and hands out data on the other hand, could be run with more confinement.

jasonbrooks commented 7 years ago

kubeadm is distributed as a deb or rpm package, and it depends on the kubelet and the cni packages (and on docker). You start the kubelet, and then you run kubeadm, which creates manifests for etcd, apiserver, controller manager, scheduler and proxy, and those all run as containers. That's the main way to run kubeadm, as described here: https://kubernetes.io/docs/setup/independent/install-kubeadm/

I have experimented with running kubeadm as a system container, as well: http://www.projectatomic.io/blog/2017/05/testing-system-containerized-kubeadm/

rhatdan commented 7 years ago

Ok That is kind of what I thought. If we split allof these services into different system containers or orchestrated containers, some can probably run confined and some need to run with ull privs. kubeadmin as a tool for an administrator should be run with full privs. spc_t, if it runs inside of a container, if it runs outside, it would run as the administrators label.

If all of these services are running in the same container, then they would have to probably run as privileged.

jasonbrooks commented 7 years ago

They're all running in separate containers. They can run as container_t, but apiserver and controller manager need to open and read cert_t, and etcd needs access to container_var_lib_t.

We can create the /etc/kubernetes/pki and /var/lib/etcd dirs and set their contexts to container_share_t in the spec file for the kubeadm rpm, or we can make the apiserver and controller manager containers run as spc_t (like the etcd container does now), and have it just work, but w/o confinement, or maybe make some sort of custom policy or something like that.

What do you think, @rhatdan

coeki commented 7 years ago

As @jasonbrooks describes we have few options here. But it's not the main thing.

The main thing is, where do we store the secrets... I thought the consensus was to store the CA and stuff in kubernetes secrets, so then only spc_t is needed for etcd

@luxas @timothysc @jbeda

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

timothysc commented 6 years ago

/lifecycle frozen

timothysc commented 6 years ago

/cc @detiber

cynepco3hahue commented 6 years ago

Hi I still have a problem with SELinux when I run kubeadm init

audit2allow -a -w
type=AVC msg=audit(1522929610.297:136): avc:  denied  { write } for  pid=2817 comm="etcd" name="etcd" dev="dm-0" ino=67228425 scontext=system_u:system_r:svirt_lxc_net_t:s0:c430,c632 tcontext=system_u:object_r:var_lib_t:s0 tclass=dir

Versons:

rhatdan commented 6 years ago

Looks like a directory in /var/lib/etcd? Is volume mounted into a container without a correct SELinux label on it. Mounting this with the equivalent of :Z will fix that or chcon -R -v svirt_sandbox_file_t /var/lib/etcd

And then it should work.

timothysc commented 6 years ago

/assign @detiber

detiber commented 6 years ago

I suspect we can handle this by setting a security context on the static pod definitions where needed (and only conditionally based on whether selinux is enabled on the host).

rhatdan commented 6 years ago

I believe the container runtimes will ignore the security context if the hosts are not enabled.

rosti commented 6 years ago

ref #1026 #1082

While trying to reproduce the issues with latest CentOS I noticed that the API server cannot load its certificate if SELinux is set in enforcing mode.

Here is my comment: https://github.com/kubernetes/kubeadm/issues/1082#issuecomment-416991032

rhatdan commented 6 years ago

AVC Messages?

ausearch -m avc -ts recent

rosti commented 6 years ago

@rhatdan

----
time->Fri Aug 31 11:47:18 2018
type=PROCTITLE msg=audit(1535705238.732:281): proctitle=6B7562652D617069736572766572002D2D617574686F72697A6174696F6E2D6D6F64653D4E6F64652C52424143002D2D6164766572746973652D616464726573733D3139322E3136382E3231372E313333002D2D616C6C6F772D70726976696C656765643D74727565002D2D636C69656E742D63612D66696C653D2F6574632F
type=SYSCALL msg=audit(1535705238.732:281): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420427a10 a2=80000 a3=0 items=0 ppid=4525 pid=4541 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="kube-apiserver" exe="/usr/local/bin/kube-apiserver" subj=system_u:system_r:container_t:s0:c224,c932 key=(null)
type=AVC msg=audit(1535705238.732:281): avc:  denied  { read } for  pid=4541 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=604382 scontext=system_u:system_r:container_t:s0:c224,c932 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:47:26 2018
type=PROCTITLE msg=audit(1535705246.653:285): proctitle=65746364002D2D6164766572746973652D636C69656E742D75726C733D68747470733A2F2F3132372E302E302E313A32333739002D2D636572742D66696C653D2F6574632F6B756265726E657465732F706B692F657463642F7365727665722E637274002D2D636C69656E742D636572742D617574683D74727565002D2D6461
type=SYSCALL msg=audit(1535705246.653:285): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420195d70 a2=80000 a3=0 items=0 ppid=4594 pid=4609 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:container_t:s0:c315,c1002 key=(null)
type=AVC msg=audit(1535705246.653:285): avc:  denied  { read } for  pid=4609 comm="etcd" name="peer.crt" dev="dm-0" ino=102172270 scontext=system_u:system_r:container_t:s0:c315,c1002 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:52:29 2018
type=PROCTITLE msg=audit(1535705549.708:291): proctitle=6B7562652D617069736572766572002D2D617574686F72697A6174696F6E2D6D6F64653D4E6F64652C52424143002D2D6164766572746973652D616464726573733D3139322E3136382E3231372E313333002D2D616C6C6F772D70726976696C656765643D74727565002D2D636C69656E742D63612D66696C653D2F6574632F
type=SYSCALL msg=audit(1535705549.708:291): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c4205c5800 a2=80000 a3=0 items=0 ppid=4839 pid=4855 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="kube-apiserver" exe="/usr/local/bin/kube-apiserver" subj=system_u:system_r:container_t:s0:c224,c932 key=(null)
type=AVC msg=audit(1535705549.708:291): avc:  denied  { read } for  pid=4855 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=604382 scontext=system_u:system_r:container_t:s0:c224,c932 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:52:36 2018
type=PROCTITLE msg=audit(1535705556.661:295): proctitle=65746364002D2D6164766572746973652D636C69656E742D75726C733D68747470733A2F2F3132372E302E302E313A32333739002D2D636572742D66696C653D2F6574632F6B756265726E657465732F706B692F657463642F7365727665722E637274002D2D636C69656E742D636572742D617574683D74727565002D2D6461
type=SYSCALL msg=audit(1535705556.661:295): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420195d70 a2=80000 a3=0 items=0 ppid=4907 pid=4922 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:container_t:s0:c315,c1002 key=(null)
type=AVC msg=audit(1535705556.661:295): avc:  denied  { read } for  pid=4922 comm="etcd" name="peer.crt" dev="dm-0" ino=102172270 scontext=system_u:system_r:container_t:s0:c315,c1002 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
rhatdan commented 6 years ago

You are volume mounting some content in /etc/pki into the container?

rosti commented 6 years ago

Yes, in /etc/kubernetes/pki. That's how kubeadm works.

rhatdan commented 6 years ago

If no other confined processes are reading these files other then containers, then you could mount it using :z, or chcon -t container_share_t -R /etc/kubernetes/pki

rosti commented 6 years ago

Thanks @rhatdan I'll try that next week.

randomvariable commented 6 years ago

We also have kubelet itself reading certs, which isn't containerised. Anything else needed for that @rhatdan?

rhatdan commented 6 years ago

What label does the kublet run with?

ps -eZ | grep kublet

randomvariable commented 6 years ago

system_u:system_r:unconfined_service_t:s0 31110 ? 00:00:01 kubelet

Ah, ok this makes sense now. Also linking https://bugzilla.redhat.com/show_bug.cgi?id=1546160 for other travellers.

Thanks.

rhatdan commented 6 years ago

Since it is running as an unconfined service, then it would not have any issue reading the labels on content if it was labeled container_file_t.

randomvariable commented 6 years ago

Thanks Dan.

So, https://github.com/kubernetes/kubernetes/pull/68448 gets kubeadm initialised nodes working on Fedora 28 with Docker+SELinux with containers confined to container_t at least, with one additional manual command.

I have a number of questions about what we should be doing though (which should be postponed til after 1.12):

rhatdan commented 6 years ago
Should we get changes made to container-selinux upstream or ship policies in our kubeadm package

It would probably be best to get them into container-selinux, to at least have me review them.

    We have caveats around the fact that both the etcd data directory and certificate directory are configurable at run-time.

The PR uses opencontainers/selinux to just write the extended attributes on certs and the etcd data dir.

    Should this actually be applied with semanage fcontext + restorecon ?

Yes that would be best, although if everyone agrees on this, we could get them into the upstream package. Should these files be shared read/only or Read/write?

Should we apply PodSecurityPolicies, particularly to the certificates directory, such that each of the core components then have private shares.

    Would require further split out of certificates into directories per component, at least for the private key.

I have no idea.


I had to manually apply container_file_t to /opt/cni/bin as the current practice of most CNI plugins is to mount it and then write their plugins into that host directory.

container_t process should be able to read/execute the default label on these files/directories, Do you want to allow containers to write to this directory?

$ matchpathcon /opt/cni/bin
/opt/cni/bin    system_u:object_r:bin_t:s0
$ sesearch -A -s container_t -t bin_t -c file
allow domain base_ro_file_type:file { getattr ioctl lock open read };
allow svirt_sandbox_domain exec_type:file { entrypoint execute execute_no_trans getattr ioctl lock map open read };
randomvariable commented 6 years ago

Thanks. I'll have a look at what to do for container-selinux next week.

container_t process should be able to read/execute the default label on these files/directories, Do you want to allow containers to write to this directory?

Unfortunately, yes. The way most CNI plugins work now in their default manifests, they use an init container to download their CNI plugin and store it in /opt/cni/bin for kubelet to then use, even though our CNI RPM may already have them so that they can match the CNI plugin version being used with the rest of their control planes.

On the flip side, I think I can narrow down write access down to just etcd for the data directory, and kubelet remaining unconfined can do its certificate rotation as normal.

The fact that the CNI RPM puts stuff in /opt is problematic for Atomic Hosts anyway, so maybe we need to address CentOS / Fedora support more widely?

DanyC97 commented 5 years ago

@timothysc do you know if this work was / is scheduled for 1.15 cycle ?