Closed Walnux closed 2 years ago
@Walnux thanks!
Then start the intel device plugins framework using commd
we need to focus on the OLM path:
readOnlyRootFilesystem
?ClusterServiceVersion
metadata. Look for clusterPermissions
.@mythi readOnlyRootFilesystem is very useful to protect the rootfs and we should keep it.
And I agree with you @mythi after we can mannually start the operator and figure out the potential issues. We now should work on the bundle image and run the operator in OLM. That will apply another set of priviligy setting.
also file a same bug to track this on red hat buggerzilla https://bugzilla.redhat.com/show_bug.cgi?id=2026086
https://github.com/intel/intel-device-plugins-for-kubernetes/pull/787 is sent for review
according to the feed back from Peter Hunt pehunt@redhat.com,
securityContext:
seLinuxOptions:
type: "spc_t"
Can be used to allow Pod to access the host filesystem without running pod with the privileged rights. It can be verified to work properly. Will submit PR to fix the issue.
@Walnux is spc_t
the only type label that works or can we go with the one that NFD source dir is labeled?
It also works on NFD source dir. I am submitting the PR
spc_t
probably works but are there other *_t
types that'd fit better?
I believe spc_t
is the right option. container selinux module doesn't really give much granularity (other than special types for containers that are init processes and kata containers).
Also, I am wondering if anyone would mind helping me figure out why adding :z
helps anything. From my testing, all I can see that it does is append :z
to the destination path.
If someone would be willing, I would be interested in seeing the output of /proc/mounts
for the pod that has :z
in its mount
We also tested that with spc_t we don't have to use :z. You can see below documents why we tried :z https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers
We also tested that with spc_t we don't have to use :z. You can see below documents why we tried :z https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers
yeah that makes sense. However, CRI-O should not be processing :z
, whereas podman is expected to. I see why you'd try it, but I mostly want to figure out why it worked (and possibly stop it from working, if it looks like a bug)
for instance, if you change it to
mountPath: '/dev/sgx_enclave:Z'
or even differently:
mountPath:'/dev/sgx_enclave_other'
without spc_t, does it work?
The /dev directory with :z doesn't work. only the other normal directory like mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z' it works without spc_t.
according to https://developers.redhat.com/blog/2014/11/06/introducing-a-super-privileged-container-concept# spc_t is still the a super priviledged conatiner which "only" applies the mount namespace. (Please correct me) I feel this is still pretty privileged for us. Firstly, for the init container we only need to copy the NFD hook from conatiner into /etc/kubernetes/node-feature-discovery/source.d/ on the host. I don't think we have to use spec_t. Personally, I like the :z solution. Secondly, for the sgx plugin container. we only need to acess the /dev/sgx_x device interface on host from the container. I think running as spc_t assinged too much privileges for the container. A more fine control might be needed instead of directly run as some pretty prioviledged container. I actually like the idea of https://github.com/kubernetes/kubernetes/issues/60748
Further more if we have to use spc_t, we have to carefully inspect the these two container images and make sure we didn't include any extra binaies that is not needed and increase the potential the security attack interfaces.
Since all the cerficated images on OCP have to be based on UBI image., we have quickly gone through the UBI base images, the smallest one we can find is UBI-micro which is ~30M after decompressed. see https://catalog.redhat.com/software/containers/ubi8-micro/601a84aadd19c7786c47c8ea
We are using https://github.com/intel/intel-device-plugins-for-kubernetes/issues/852 to track the UBI based image task
Firstly, for the init container we only need to copy the NFD hook from conatiner into /etc/kubernetes/node-feature-discovery/source.d/ on the host. I don't think we have to use spec_t. Personally, I like the :z solution.
my point is that :z
solution shouldn't work, and if it does I want to stop it from working.
can you try
volumeMounts:
- name: nfd-source-hooks
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:Z'
also, what installs the device /dev/sgx_x
? maybe a selinux rule could be added to allow containers access to it?
or better yet, do you have access to a node with this device that I can play around with? I would be happy to investigate a solution for y'all (ideally that doesn't give spc_t)
also, what installs the device /dev/sgx_x? maybe a selinux rule could be added to allow containers access to it?
it's an in-tree kernel driver, RHEL 8.4+ has it as a tech preview. would that already cover the rules part automatically? I'll work with @Walnux to check if was get you an access to a node with SGX.
volumeMounts:
- name: nfd-source-hooks
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:Z'
I have tied and it can work without spc_t. :)
I have tied and it can work without spc_t. :)
what is /proc/mounts
inside the container that has this mounted? I believe you have just mounted the literal directory /etc/kubernetes/node-feature-discovery/source.d/:Z
. As a final piece of experimentation, can you try
volumeMounts:
- name: nfd-source-hooks
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/sources'
as I believe all it's doing is creating a new directory there
It is not easy for me to use the current upstream container image which is using gcr.io/distroless/static to debug and acquire /proc/mounts. I will try to use the UBI-micro based image and check whether I can easily acquire /proc/mounts.
@Walnux you can build toybox using cat support pretty easily:
$ git diff
diff --git a/build/docker/toybox-config b/build/docker/toybox-config
index df9e6d3..f415aa8 100644
--- a/build/docker/toybox-config
+++ b/build/docker/toybox-config
@@ -21,7 +21,7 @@ CONFIG_TOYBOX_GETRANDOM=y
#
# CONFIG_BASENAME is not set
# CONFIG_CAL is not set
-# CONFIG_CAT is not set
+CONFIG_CAT=y
# CONFIG_CAT_V is not set
# CONFIG_CATV is not set
# CONFIG_CHGRP is not set
$ make intel-sgx-initcontainer
...
$ docker run --entrypoint "" intel/intel-sgx-initcontainer:devel cat /proc/mounts
# (change the initContainer command to cat instead of the default entrypoint)
@haircommander I think you are right. :z should just create a new directory there. And it actually just hides the issue. And I also checked the host, the hook file is not installed there. Thanks! I still paste the log here.
daemonset yaml without :z
initContainers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: intel-sgx-initcontainer
command:
- sh
- '-c'
- >-
cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
/etc/kubernetes/node-feature-discovery/source.d/
securityContext:
readOnlyRootFilesystem: false
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfd-source-hooks
mountPath: /etc/kubernetes/node-feature-discovery/source.d/
Log:
overlay / overlay rw,context="system_u:object_r:container_file_t:s0:c88,c734",relatime,lowerdir=/var/lib/containers/storage/overlay/l/CGEOKTEVBSWVW37STEBG7DSUZK:/var/lib/containers/storage/overlay/l/ZZ7PBK43SV6NRMUKXNTMLRJ2DG,upperdir=/var/lib/containers/storage/overlay/1ab5e921e8cba30560e8838dbb8b635553715681c5e743fe370045fe3b03e2ba/diff,workdir=/var/lib/containers/storage/overlay/1ab5e921e8cba30560e8838dbb8b635553715681c5e743fe370045fe3b03e2ba/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup ro,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/pids cgroup ro,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpuset cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup ro,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup ro,seclabel,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/perf_event cgroup ro,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/devices cgroup ro,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup ro,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
shm /dev/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,nodev,noexec,relatime,size=65536k 0 0
tmpfs /etc/resolv.conf tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0
tmpfs /etc/hostname tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
/dev/sda4 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/sda4 /dev/termination-log xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
**/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0**
tmpfs /var/run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=262629092k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/sched_debug tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
cp: /etc/kubernetes/node-feature-discovery/source.d//intel-sgx-epchook: Permission denied
daemonset yaml with :z
initContainers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: intel-sgx-initcontainer
command:
- sh
- '-c'
- >-
cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
/etc/kubernetes/node-feature-discovery/source.d/
securityContext:
readOnlyRootFilesystem: false
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfd-source-hooks
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'
log:
overlay / overlay rw,context="system_u:object_r:container_file_t:s0:c235,c809",relatime,lowerdir=/var/lib/containers/storage/overlay/l/CGEOKTEVBSWVW37STEBG7DSUZK:/var/lib/containers/storage/overlay/l/ZZ7PBK43SV6NRMUKXNTMLRJ2DG,upperdir=/var/lib/containers/storage/overlay/44d494b8811e741dc3321a54bd84864f0e55a9c934b4a995dae042006c4b5e54/diff,workdir=/var/lib/containers/storage/overlay/44d494b8811e741dc3321a54bd84864f0e55a9c934b4a995dae042006c4b5e54/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup ro,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/pids cgroup ro,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpuset cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup ro,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup ro,seclabel,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/perf_event cgroup ro,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/devices cgroup ro,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup ro,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
shm /dev/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,nodev,noexec,relatime,size=65536k 0 0
tmpfs /etc/resolv.conf tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0
tmpfs /etc/hostname tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
/dev/sda4 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/sda4 /dev/termination-log xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
**/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d/:z xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0**
tmpfs /var/run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=262629092k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/sched_debug tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0
daemonset yaml
initContainers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: intel-sgx-initcontainer
command:
- sh
- '-c'
- >-
cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
/etc/kubernetes/node-feature-discovery/source.d/
securityContext:
readOnlyRootFilesystem: false
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfd-source-hooks
mountPath: /etc/kubernetes/node-feature-discovery/source.d/sources
log:
...
/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d/sources xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
...
it's an in-tree kernel driver, RHEL 8.4+ has it as a tech preview. would that already cover the rules part automatically? I'll work with @Walnux to check if was get you an access to a node with SGX.
we are trying to figure a way to let @haircommander access the node with SGX support. But that needs some time and efforts. Before that, I can just work as the proxy for @haircommander and try to figure out a proper solution. :)
@haircommander Any updates? Thanks!
what's ls -lZd /etc/kubernetes/node-feature-discovery/source.d/
? either we have to change the label of that directory so all containers can access it, or we need to make this plugin privileged.
I guess what you talking about is Host OS.
I use below way to access the node:
[jxu36@jfz1r09h07 ~]$ oc debug node/worker-1
Starting pod/worker-1-debug ...
To use host binaries, run chroot /host
Pod IP: 172.16.9.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ls
bin boot dev etc home lib lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var
sh-4.4# ls -lZd /etc/kubernetes/node-feature-discovery/source.d/
drwxr-xr-x. 2 root root system_u:object_r:kubernetes_file_t:s0 31 Jan 21 20:40 /etc/kubernetes/node-feature-discovery/source.d/
sh-4.4# ls -lZd /dev
drwxr-xr-x. 20 root root system_u:object_r:device_t:s0 3360 Nov 30 06:09 /dev
If you want this content to be read/write within a container it will need to be container_file_t.
@rhatdan it is from the node running RHCOS as HostOS. How could the user set the proper label? I think the Label should be set by the OS and CRI-O before starting the Pod. Please correct me. Thanks!
what about instead of setting spc_t, we give the pod container_runtime_t
? that should be about to access the kubernetes dir. Then ni the SCC, we could set the seLinuxOption to MustRunAs
What does "ni" mean? For Kubernetes dir accessing from Pod, looks like it is a better solution than spc_t. :) How about the /dev/sgx_x accessing from Pod?
sorry, a typo, I meant to type "in"
I believe access to devices need privilege unless they're going through a device plugin, so we otherwise have a circle and egg problem haha
@mythi what are your opinions? :)
I believe access to devices need privilege unless they're going through a device plugin
it does not sound right that we need privileged to just check the dev nodes are present. was there a device label (device_t
?) that could be used
hm yeah it does look like there's a device_t, could you give it a try?
device_t is generic, containers will not be allowed to use chr_file or block_file labeled device_t.
sesearch -A -s container_t -c chr_file -p write
allow domain devtty_t:chr_file { append getattr ioctl lock open read write };
allow domain kmsg_device_t:chr_file { append getattr ioctl lock open write }; [ domain_can_write_kmsg ]:True
allow domain null_device_t:chr_file { append getattr ioctl lock open read write };
allow domain zero_device_t:chr_file { append getattr ioctl lock map open read write };
allow svirt_sandbox_domain sshd_devpts_t:chr_file { append getattr ioctl lock read write };
allow svirt_sandbox_domain user_devpts_t:chr_file { append getattr ioctl lock read write };
allow svirt_sandbox_domain user_tty_device_t:chr_file { append getattr ioctl lock read write };
allow syslog_client_type console_device_t:chr_file { append getattr ioctl lock open write };
@rhatdan @haircommander what is your suggestion then? Can we get a conclusion on this issue? thanks!
I added a boolean to container-selinux policy to allow a container to have full access to all devices.
container_connect_any --> off container_manage_cgroup --> off container_use_cephfs --> off container_use_devices --> on logrotate_read_inside_containers --> off
$ sudo setsebool container_use_devices true
$ rpm -q container-selinux container-selinux-2.176.0-2.fc36.noarch
The udica project is working on a fix for it, to be able to support more fine grained control.
@rhatdan Cool, this fine-grained control is the right thing to do. :) Could you let us know when we can use it on an OCP release? Do you have a plan for it?
currently that version of containers-selinux isn't tareted at 4.10, so I am guessing at earliest it would make 4.11. We have to be careful about bumps to containers-selinux so close to GA, and the fix may not qualify for backport.
For the /etc/kubernetes/node-feature-discovery/source.d/
piece, have you tried the solution in https://github.com/intel/intel-device-plugins-for-kubernetes/issues/762#issuecomment-1024259524
@haircommander I will try that solution. Also is there a way to test the up-coming fix for container-selinux in our cluster. is there a link to a branch or build for us to test? Thanks
Hi @haircommander @rhatdan, Is it possible that we can define the Selinux policy/label to assign the permissions which are really needed by the container? For our SGX device plugin it only access /dev/sgx_provision /dev/sgx_epc. So We can define a label called Intel_sgx_t to only assign the access permission to these two device files. And all the other device files access should be denied. I think that is the best way to protect security. And I know we can define our own policy on Selinux in RHEL. But on OCP how to deploy the policy is a problem.
are you planning on using an operator to install on OCP? maybe it could install machine configs that enable that policy.
Yes, Operator should be the right way to install it. Can you point us some detail about the machine config? I think this should not a requirement from us. It should be a general request. :)
If these devices get added to a container, then there is no need to label, the devices will get the label of the container. If you are volume mounting them into the container, then they would not be allowed access. @haircommander How do you add a device to a container with k8s?
that's the problem, this is a container that enables other containers to add devices. there's no way to do so without a device plugin, but we're putting together the device plugin...
If you need to volume mount them in, and want the containers to have access then you could just
chcon -t container_file_t /dev/sgx*
To make this permanent, you could execute something like:
semanage fcontext -a -t container_file_t '/dev/sgx.*'
restorecon -R -v /dev/sgx*
Then when the devices got created at reboot they would be labeled correctly.
To make this permanent, you could execute something like:
semanage fcontext -a -t container_file_t '/dev/sgx.*' restorecon -R -v /dev/sgx*
@rhatdan thanks! can these be managed by a MachineConfig
or we need the Special Resource Operator to run them?
Not my area of expertise, but I think MachineConfig should be able to do it.
typically the way to do it on rhcos is to create a machine config that creates a systemd unit file that runs the commands. Those kinds of state changes often don't persist across reboots otherwise. the sgx operator could create said machineconfig and trigger a reboot, then the device would be available on the next reboot
Comments
The issue is:
If I enable SeLinux up as below on my work node
Then My initial container will run into "permission access denied" issue on all the volume mounted in the pod if I close the Selinux as below
The operator can be up and running properly. You can reproduce the issue using the below steps
Reproduce Steps
Firstly I have to apply below patches to setup SCC according to documents: SCC in OCP-4.9 Guide to UID, GID
run operator manually
Then start the intel device plugins framework using command
$ oc apply -k intel-device-plugins-for-kubernetes/deployments/operator/default/
and start SGX pluin DS asoc apply -f intel-device-plugins-for-kubernetes/deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml
The intel device plugins framework can up and running, and the SGX plugin DS also up and running. But the init container in the pod run into the "permission access denied issue" when try to access directory /etc/kubernetes/node-feature-discovery/source.d/
Run operator though OLM
You can also run the operator through OLM
operator-sdk run bundle docker.io/walnuxdocker/intel-device-plugins-operator-bundle:0.22.0
The result is the same with run manually this is the volume mounted in the podAnalysis:
You can see that I assigned the SCC as hostmount-anyuid. And after I disabled the Selinux with command on worknode 1 with command
$sudo setenforce 0
Operator up and run on this node. But I leave Selinux enable on worknode 0 "The permission access denied issue still there"After I set the SCC as hostaccess, no matter I disable or enable the SeLinux, The permission access denied issue always happens.
The proper way to access shared directory in pod
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z' and using SCC hostmount-anyuid, looks like above issue can be resolved the init container can work with Selinux set as enforcing mode. the root cause is: According to https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers The root cause might be:
The container engine, Podman, launches each container with a unique process SELinux label (usually container_t) and labels all of the container content with a single label (usually container_file_t). We have rules that state that container_t can read and write all content labeled container_file_t. This simple idea has blocked major file system exploits.
Everything works perfectly until the user attempts a volume mount. The problem with volumes is that they usually only bind mounts on the host. They bring in the labels from the host, which the SELinux policy does not allow the process label to interact with, and the container blows up.
However the sgxplugin container runinto permission access deny issue
The error is: E1130 05:11:07.898395 1 sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: permission denied
Try to resolve the above issue using the similar way to mount /dev/sgx_enclave with :z
It runs into below error sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: no such file or directory
The proper way to access host devices from the container
After I use SCC privileged, and set privileged: true
above issue can be resolved.
according to https://kubernetes.io/docs/concepts/policy/pod-security-policy/ a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.
I am concerned about using this privilege right And others also has the similar concern and request a new feature in K8S See https://github.com/kubernetes/kubernetes/issues/60748
However, since the SGX device plugin has to access the SGX devices of host, looks like we can only use the privileged container. @mythi What's your comments? :)
reference to similar project like SRO
In Special resource operator, looks like the similar security policy is applied https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L17
https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L70