Closed luxas closed 4 years ago
@pipejakob I added the kind/postmortem
label as it's in the same theme, we broke SELinux users again without noticing it...
I don't work with kubadmin but would be very willing to help whoever takes this on.
@rhatdan Great! What I'm looking for is persons that are familiar with SELinux and willing to help. I might be able to coordinate the work though.
A rough todo list would look like:
@rhatdan Let's first try and get it working in v1.7, can be done in #215
@timothysc
I will take for now, since I raised it. I'll have some updates soon, @rhatdan, please advise me ;)
@luxas , @jasonbrooks - does this still exist in fedora?
I think folks have patched policies on other channels.
/cc @eparis
@timothysc I haven't tried w/ 1.7 yet, but w/ 1.6, CentOS worked w/ selinux but Fedora 25 didn't. I'll test w/ 1.7
for reference, I just ran kubeadm 1.7 on f26 in permissive mode, and these are the denials I got:
[root@fedora-1 ~]# ausearch -m avc -ts recent
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:321): avc: denied { read } for pid=2885 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:322): avc: denied { open } for pid=2885 comm="kube-apiserver" path="/etc/kubernetes/pki/apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:331): avc: denied { read } for pid=2945 comm="kube-controller" name="sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:332): avc: denied { open } for pid=2945 comm="kube-controller" path="/etc/kubernetes/pki/sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
On CentOS 7, same thing, no denials.
You are volume mounting in content from the host into a container. If you want an SELinux confined process inside the container to be able to read the content, it has to have an SELinux label that the container is allowed to read.
Mounting the object with :Z or :z would fix the issue. Note either of these would allow the container to write these objects. If you want to allow the container to read without writing then you could change the content on the host to something like container_share_t.
https://github.com/kubernetes/kubernetes/pull/48607 will also help here as it starts making mounting everything but etcd read-only...
@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?
To me it's unclear which policies kubeadm should add to:
that is working on CentOS, Fedora and CoreOS
On 12 Jul 2017, at 16:57, Timothy St. Clair notifications@github.com wrote:
@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@rhatdan It looks like :Z is only used if the pod provides an selinux label. In my initial tests, container_runtime_t
seems to work -- would that be an appropriate label? And then, I'm assuming in a system w/o selinux, this would just be ignored?
Yes it will be ignored by non SELinux systems. RUnning an app as container_runtime_t, basically provides no SELinux confinement, since it is supposed to be the label of container runtimes like docker and CRI-O. If you are running the kublet as this, that is probably fairly accurate.
Right now, we're running the etcd container as spc_t -- would it be better to run that one as container_runtime_t too?
It looks like this does it:
diff --git a/cmd/kubeadm/app/master/manifests.go b/cmd/kubeadm/app/master/manifests.go
index 55fe560c46..228f935cdd 100644
--- a/cmd/kubeadm/app/master/manifests.go
+++ b/cmd/kubeadm/app/master/manifests.go
@@ -96,6 +96,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
LivenessProbe: componentProbe(int(cfg.API.BindPort), "/healthz", api.URISchemeHTTPS),
Resources: componentResources("250m"),
Env: getProxyEnvVars(),
+ SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
}, volumes...),
kubeControllerManager: componentPod(api.Container{
Name: kubeControllerManager,
@@ -105,6 +106,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
LivenessProbe: componentProbe(10252, "/healthz", api.URISchemeHTTP),
Resources: componentResources("200m"),
Env: getProxyEnvVars(),
+ SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
}, volumes...),
kubeScheduler: componentPod(api.Container{
Name: kubeScheduler,
Would this be something to submit as PRs to the 1.7 branch and to master, or just to master? The source moved around a bit in master, the patch above is to the 1.7 branch.
I would actually prefer that it run as spc_t, or as a confined domain(container_t). etcd should be easily be able to be confined by SELinux.
I think spc_t should work. I tried w/ container_t and that didn't work. audit2allow says it needs:
allow container_t cert_t:file { open read };
Could we relabel the certs directory with container_file_t or container_share_t. then it would work.
kubeadm creates an /etc/kubernetes/pki
dir when you run kubeadm init
, but when you kubeadm reset
, it only empties that dir. If we created the pki dir when the rpm is installed, we could do the labeling at that point, by modding the spec file.
For etcd, the container would need allow container_t container_var_lib_t:file { create lock open read unlink write };
for /var/lib/etcd
on the host.
I'm trying to figure out if it's legitimate to chcon directories in the rpm spec file -- I see many instances of it in github (https://github.com/search?l=&p=1&q=chcon+extension%3Aspec) but I can't tell whether that's considered good packaging practice or not. We could either change kubeam to run the components as spc_t, unconfined, or we could leave kubeadm alone and chcon the pki dir.
I am not as familiar with the Kubernetes Architecture, as I should be. But are we talking about different containers or the same container. kubeadmin container versus etcd container? The management container which can launch other containers as "priviliged" should be running as spc_t, since confining it buys us nothing. A service that just listens on the network and hands out data on the other hand, could be run with more confinement.
kubeadm is distributed as a deb or rpm package, and it depends on the kubelet and the cni packages (and on docker). You start the kubelet, and then you run kubeadm, which creates manifests for etcd, apiserver, controller manager, scheduler and proxy, and those all run as containers. That's the main way to run kubeadm, as described here: https://kubernetes.io/docs/setup/independent/install-kubeadm/
I have experimented with running kubeadm as a system container, as well: http://www.projectatomic.io/blog/2017/05/testing-system-containerized-kubeadm/
Ok That is kind of what I thought. If we split allof these services into different system containers or orchestrated containers, some can probably run confined and some need to run with ull privs. kubeadmin as a tool for an administrator should be run with full privs. spc_t, if it runs inside of a container, if it runs outside, it would run as the administrators label.
If all of these services are running in the same container, then they would have to probably run as privileged.
They're all running in separate containers. They can run as container_t, but apiserver and controller manager need to open and read cert_t, and etcd needs access to container_var_lib_t.
We can create the /etc/kubernetes/pki and /var/lib/etcd dirs and set their contexts to container_share_t in the spec file for the kubeadm rpm, or we can make the apiserver and controller manager containers run as spc_t (like the etcd container does now), and have it just work, but w/o confinement, or maybe make some sort of custom policy or something like that.
What do you think, @rhatdan
As @jasonbrooks describes we have few options here. But it's not the main thing.
The main thing is, where do we store the secrets... I thought the consensus was to store the CA and stuff in kubernetes secrets, so then only spc_t is needed for etcd
@luxas @timothysc @jbeda
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
/lifecycle frozen
/cc @detiber
Hi I still have a problem with SELinux when I run kubeadm init
audit2allow -a -w
type=AVC msg=audit(1522929610.297:136): avc: denied { write } for pid=2817 comm="etcd" name="etcd" dev="dm-0" ino=67228425 scontext=system_u:system_r:svirt_lxc_net_t:s0:c430,c632 tcontext=system_u:object_r:var_lib_t:s0 tclass=dir
Versons:
kubeadm 1.9.3
CentOS 7.4
Looks like a directory in /var/lib/etcd? Is volume mounted into a container without a correct SELinux label on it. Mounting this with the equivalent of :Z will fix that or chcon -R -v svirt_sandbox_file_t /var/lib/etcd
And then it should work.
/assign @detiber
I suspect we can handle this by setting a security context on the static pod definitions where needed (and only conditionally based on whether selinux is enabled on the host).
I believe the container runtimes will ignore the security context if the hosts are not enabled.
ref #1026 #1082
While trying to reproduce the issues with latest CentOS I noticed that the API server cannot load its certificate if SELinux is set in enforcing mode.
Here is my comment: https://github.com/kubernetes/kubeadm/issues/1082#issuecomment-416991032
AVC Messages?
ausearch -m avc -ts recent
@rhatdan
----
time->Fri Aug 31 11:47:18 2018
type=PROCTITLE msg=audit(1535705238.732:281): proctitle=6B7562652D617069736572766572002D2D617574686F72697A6174696F6E2D6D6F64653D4E6F64652C52424143002D2D6164766572746973652D616464726573733D3139322E3136382E3231372E313333002D2D616C6C6F772D70726976696C656765643D74727565002D2D636C69656E742D63612D66696C653D2F6574632F
type=SYSCALL msg=audit(1535705238.732:281): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420427a10 a2=80000 a3=0 items=0 ppid=4525 pid=4541 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="kube-apiserver" exe="/usr/local/bin/kube-apiserver" subj=system_u:system_r:container_t:s0:c224,c932 key=(null)
type=AVC msg=audit(1535705238.732:281): avc: denied { read } for pid=4541 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=604382 scontext=system_u:system_r:container_t:s0:c224,c932 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:47:26 2018
type=PROCTITLE msg=audit(1535705246.653:285): proctitle=65746364002D2D6164766572746973652D636C69656E742D75726C733D68747470733A2F2F3132372E302E302E313A32333739002D2D636572742D66696C653D2F6574632F6B756265726E657465732F706B692F657463642F7365727665722E637274002D2D636C69656E742D636572742D617574683D74727565002D2D6461
type=SYSCALL msg=audit(1535705246.653:285): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420195d70 a2=80000 a3=0 items=0 ppid=4594 pid=4609 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:container_t:s0:c315,c1002 key=(null)
type=AVC msg=audit(1535705246.653:285): avc: denied { read } for pid=4609 comm="etcd" name="peer.crt" dev="dm-0" ino=102172270 scontext=system_u:system_r:container_t:s0:c315,c1002 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:52:29 2018
type=PROCTITLE msg=audit(1535705549.708:291): proctitle=6B7562652D617069736572766572002D2D617574686F72697A6174696F6E2D6D6F64653D4E6F64652C52424143002D2D6164766572746973652D616464726573733D3139322E3136382E3231372E313333002D2D616C6C6F772D70726976696C656765643D74727565002D2D636C69656E742D63612D66696C653D2F6574632F
type=SYSCALL msg=audit(1535705549.708:291): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c4205c5800 a2=80000 a3=0 items=0 ppid=4839 pid=4855 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="kube-apiserver" exe="/usr/local/bin/kube-apiserver" subj=system_u:system_r:container_t:s0:c224,c932 key=(null)
type=AVC msg=audit(1535705549.708:291): avc: denied { read } for pid=4855 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=604382 scontext=system_u:system_r:container_t:s0:c224,c932 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
----
time->Fri Aug 31 11:52:36 2018
type=PROCTITLE msg=audit(1535705556.661:295): proctitle=65746364002D2D6164766572746973652D636C69656E742D75726C733D68747470733A2F2F3132372E302E302E313A32333739002D2D636572742D66696C653D2F6574632F6B756265726E657465732F706B692F657463642F7365727665722E637274002D2D636C69656E742D636572742D617574683D74727565002D2D6461
type=SYSCALL msg=audit(1535705556.661:295): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c420195d70 a2=80000 a3=0 items=0 ppid=4907 pid=4922 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:container_t:s0:c315,c1002 key=(null)
type=AVC msg=audit(1535705556.661:295): avc: denied { read } for pid=4922 comm="etcd" name="peer.crt" dev="dm-0" ino=102172270 scontext=system_u:system_r:container_t:s0:c315,c1002 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file
You are volume mounting some content in /etc/pki into the container?
Yes, in /etc/kubernetes/pki
. That's how kubeadm works.
If no other confined processes are reading these files other then containers, then you could mount it using :z, or chcon -t container_share_t -R /etc/kubernetes/pki
Thanks @rhatdan I'll try that next week.
We also have kubelet itself reading certs, which isn't containerised. Anything else needed for that @rhatdan?
What label does the kublet run with?
ps -eZ | grep kublet
system_u:system_r:unconfined_service_t:s0 31110 ? 00:00:01 kubelet
Ah, ok this makes sense now. Also linking https://bugzilla.redhat.com/show_bug.cgi?id=1546160 for other travellers.
Thanks.
Since it is running as an unconfined service, then it would not have any issue reading the labels on content if it was labeled container_file_t.
Thanks Dan.
So, https://github.com/kubernetes/kubernetes/pull/68448 gets kubeadm initialised nodes working on Fedora 28 with Docker+SELinux with containers confined to container_t
at least, with one additional manual command.
I have a number of questions about what we should be doing though (which should be postponed til after 1.12):
opencontainers/selinux
to just write the extended attributes on certs and the etcd data dir.
semanage fcontext
+ restorecon
?container_file_t
to /opt/cni/bin
as the current practice of most CNI plugins is to mount it and then write their plugins into that host directory.
/opt/cni/bin
is not a kubeadm concern, so haven't touched it here./opt/cni/bin
having things that didn't come from an RPM.Should we get changes made to container-selinux upstream or ship policies in our kubeadm package
It would probably be best to get them into container-selinux, to at least have me review them.
We have caveats around the fact that both the etcd data directory and certificate directory are configurable at run-time.
The PR uses opencontainers/selinux to just write the extended attributes on certs and the etcd data dir.
Should this actually be applied with semanage fcontext + restorecon ?
Yes that would be best, although if everyone agrees on this, we could get them into the upstream package. Should these files be shared read/only or Read/write?
Should we apply PodSecurityPolicies, particularly to the certificates directory, such that each of the core components then have private shares.
Would require further split out of certificates into directories per component, at least for the private key.
I have no idea.
I had to manually apply container_file_t to /opt/cni/bin as the current practice of most CNI plugins is to mount it and then write their plugins into that host directory.
container_t process should be able to read/execute the default label on these files/directories, Do you want to allow containers to write to this directory?
$ matchpathcon /opt/cni/bin
/opt/cni/bin system_u:object_r:bin_t:s0
$ sesearch -A -s container_t -t bin_t -c file
allow domain base_ro_file_type:file { getattr ioctl lock open read };
allow svirt_sandbox_domain exec_type:file { entrypoint execute execute_no_trans getattr ioctl lock map open read };
Thanks. I'll have a look at what to do for container-selinux next week.
container_t process should be able to read/execute the default label on these files/directories, Do you want to allow containers to write to this directory?
Unfortunately, yes. The way most CNI plugins work now in their default manifests, they use an init container to download their CNI plugin and store it in /opt/cni/bin
for kubelet to then use, even though our CNI RPM may already have them so that they can match the CNI plugin version being used with the rest of their control planes.
On the flip side, I think I can narrow down write access down to just etcd for the data directory, and kubelet remaining unconfined can do its certificate rotation as normal.
The fact that the CNI RPM puts stuff in /opt
is problematic for Atomic Hosts anyway, so maybe we need to address CentOS / Fedora support more widely?
@timothysc do you know if this work was / is scheduled for 1.15 cycle ?
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT or FEATURE REQUEST
COMMUNITY REQUEST
Versions
All
We need e2e tests that ensure kubeadm works with SELinux on CentOS/Fedora (https://github.com/kubernetes/kubeadm/issues/215) and CoreOS (https://github.com/kubernetes/kubeadm/issues/269)
We might be able to add a job for it on kubernetes-anywhere @pipejakob ?
IIUC kubeadm is broken with SELinux enabled right now. The problem is that we don't have one (AFAIK) very experienced with SELinux in the kubeadm team (at least nobody has had time to look into it yet)
AFAIK, the problem is often when mounting
hostPath
volumes...To get closer to production readiness, we should fix this and add a testing suite for it. We should also work with CNI network providers to make sure they adopt the right SELinux policies as well.
Anyone want to take ownership here? I'm not very experienced with SELinux, so I'm probably gonna focus on other things.
@dgoodwin @aaronlevy @coeki @rhatdan @philips @bboreham @mikedanese @pipejakob