KubeVirt on OKD makes SELinux throw "Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped)"

vrutkovs commented 2 years ago

@vrutkovs This is all journal logging around the first (failing) start of kubelet after upgrade. Should I supply a zip with all the journal logging or something else to help? I can even help by doing an upgrade again (after reinstalling a host with an older 4.10 version or something like that).

We didn't notice any SELinux errors in CI neither on new installs or upgrades.

Please check the journal and provide any details on these SELinux errors

Jul 06 18:08:53 mec-okd4-worker-03 systemd[1]: Starting Kubernetes Kubelet...
Jul 06 18:08:53 mec-okd4-worker-03 kernel: SELinux:  Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped).
AVC avc:  denied  { execute } for  pid=4040 comm="(yperkube)" name="hyperkube" dev="sda4" ino=342043427 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file permissive=0 trawcon="system_u:object_r:kubelet_exec_t:s0"
Jul 06 18:08:53 mec-okd4-worker-03 systemd[4040]: kubelet.service: Failed to locate executable /usr/bin/hyperkube: Permission denied
Jul 06 18:08:53 mec-okd4-worker-03 systemd[4040]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied
Jul 06 18:08:53 mec-okd4-worker-03 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC

Originally posted by @msteenhu in https://github.com/openshift/okd/issues/1270#issuecomment-1179711800

msteenhu commented 2 years ago

Context: I could upgrade an all-VM test cluster without problems, on a production setup all my baremetal workers have this problem after upgrade to the latest 4.10 (end June 2022). VM masters of same cluster are not affected, they just boot fine.

vrutkovs commented 2 years ago

Do you have any additional packages installed / repos enabled? Looks like some Fedora update broken SELinux rules

msteenhu commented 2 years ago

No, I did not install anything extra manual although the cluster is running virtualization and nmstate, don't know if that might impact fedora rpms..

On the first host I fiddled a bit with rpm-ostree: kernels, pivoting etc. But the other 2 hosts were 'cleanly' upgraded from 2 versions older to the newest 4.10 at the end of June.

msteenhu commented 2 years ago

This was the output from the fcos upgrade logged by MCD:

2022-07-01T06:57:07.554066823+00:00 stdout F Upgraded:
2022-07-01T06:57:07.554066823+00:00 stdout F   afterburn 5.3.0-1.fc35 -> 5.3.0-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   afterburn-dracut 5.3.0-1.fc35 -> 5.3.0-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   bind-libs 32:9.16.28-1.fc35 -> 32:9.16.29-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   bind-license 32:9.16.28-1.fc35 -> 32:9.16.29-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   bind-utils 32:9.16.28-1.fc35 -> 32:9.16.29-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   btrfs-progs 5.16.2-1.fc35 -> 5.18-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   conmon 2:2.1.0-2.3.1 -> 2:2.1.2-1.1.1
2022-07-01T06:57:07.554066823+00:00 stdout F   container-selinux 2:2.183.0-3.fc35 -> 2:2.187.0-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   cri-o 1.23.2-6.1.fc35 -> 1.23.3-1.1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   criu 3.17-1.fc35 -> 3.17-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   criu-libs 3.17-1.fc35 -> 3.17-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   cups-libs 1:2.3.3op2-17.fc35 -> 1:2.3.3op2-18.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   fuse-sshfs 3.7.2-2.fc35 -> 3.7.3-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   glibc 2.34-34.fc35 -> 2.34-35.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   glibc-common 2.34-34.fc35 -> 2.34-35.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   glibc-minimal-langpack 2.34-34.fc35 -> 2.34-35.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   gnutls 3.7.4-1.fc35 -> 3.7.6-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-common 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-efi-x64 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-pc 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-pc-modules 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-tools 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   grub2-tools-minimal 1:2.06-10.fc35 -> 1:2.06-11.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   hwdata 0.359-1.fc35 -> 0.360-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   kernel 5.17.9-200.fc35 -> 5.18.5-100.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   kernel-core 5.17.9-200.fc35 -> 5.18.5-100.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   kernel-modules 5.17.9-200.fc35 -> 5.18.5-100.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libipa_hbac 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libsss_certmap 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libsss_idmap 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libsss_nss_idmap 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libsss_sudo 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   libzstd 1.5.2-1.fc35 -> 1.5.2-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   logrotate 3.18.1-2.fc35 -> 3.18.1-4.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   mokutil 2:0.6.0-1.fc35 -> 2:0.6.0-3.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   open-vm-tools 12.0.0-1.fc35 -> 12.0.5-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   python3-libs 3.10.4-1.fc35 -> 3.10.5-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   qemu-guest-agent 2:6.1.0-14.fc35 -> 2:6.1.0-15.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   rsync 3.2.3-9.fc35 -> 3.2.4-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   selinux-policy 35.17-1.fc35 -> 35.18-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   selinux-policy-targeted 35.17-1.fc35 -> 35.18-1.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   shim-x64 15.4-5 -> 15.6-1
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-ad 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-client 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-common 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-common-pac 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-ipa 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-krb5 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-krb5-common 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-ldap 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554066823+00:00 stdout F   sssd-nfs-idmap 2.7.0-1.fc35 -> 2.7.1-2.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd-container 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd-libs 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd-pam 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd-resolved 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   systemd-udev 249.12-3.fc35 -> 249.12-5.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   unbound-libs 1.13.2-1.fc35 -> 1.16.0-3.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   vim-data 2:8.2.4975-1.fc35 -> 2:8.2.5085-1.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F   vim-minimal 2:8.2.4975-1.fc35 -> 2:8.2.5085-1.fc35
2022-07-01T06:57:07.554137005+00:00 stdout F Removed:
2022-07-01T06:57:07.554137005+00:00 stdout F   sssd-idp-2.7.0-1.fc35.x86_64
2022-07-01T06:57:07.554137005+00:00 stdout F Added:
2022-07-01T06:57:07.554137005+00:00 stdout F   aardvark-dns-1.0.3-1.fc35.x86_64
2022-07-01T06:57:07.554175533+00:00 stdout F Changes queued for next boot. Run "systemctl reboot" to start a reboot

CorneJB commented 2 years ago

Also experiencing this after the upgrade on bare metal nodes, first 4.10 release was the original install I believe.

dmesg output:

[   21.882228] audit: type=1130 audit(1657576583.360:170): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kubelet-auto-node-size comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   46.832343] SELinux:  Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped).

No custom packages here, but I have played with a custom SELinux policy for logging.

vrutkovs commented 2 years ago

It seems https://github.com/containers/container-selinux/pull/178 is related here, as we did

2022-07-01T06:57:07.554066823+00:00 stdout F   container-selinux 2:2.183.0-3.fc35 -> 2:2.187.0-1.fc35

during upgrade

@rhatdan any suggestions how to get more info " SELinux: Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped)."?

@msteenhu could you upload a must-gather somewhere (GDrive or similar) so that we'd get more info about rpm-ostree status?

msteenhu commented 2 years ago

https://maarten.gent/must-gather.tar.xz

ClusterID: a6f553f9-b175-4f3e-9dee-c8cf33b57704 ClusterVersion: Stable at "4.10.0-0.okd-2022-06-24-212905" ClusterOperators: All healthy and stable

msteenhu commented 2 years ago

Assuming my hosts are still suffering the SELinux problem (did not try reboot yet), running 'restorecon' is the fix? On kubelet? Or hyperkube? Guess the latter. I'll see if I can experiment a bit later today. I guess I can always reinstall a host from scratch if my experiment breaks the host even further.

Got this from the mentioned container-selinux issue: https://github.com/containers/container-selinux/pull/178#issuecomment-1106863807

msteenhu commented 2 years ago

Another fix is disabling SELinux (permissive mode): https://docs.okd.io/4.10/nodes/nodes/nodes-nodes-working.html#nodes-nodes-kernel-arguments_nodes-nodes-working

Not really my preference but better than running hosts that do not survive reboot (longterm)..

markusdd commented 2 years ago

Do you have custom selinux settings?

This somehow sounds familiar. We had this happen back in early June. Our selinux policy file was modified (we had to apply custom policies to allow execheap for some tools we use) and this prevented coreos from updating it. The symptom was exactly what you are seeing, kubelet wouldn't start.

There was supposed to be a fix for this in coreOS to be able to merge new upstream policy with user-modified policy when an update occurs.

Is this update live by now?

You can check wether your policy is modified by running this:

msteenhu commented 2 years ago

@markusdd

[core@mec-okd4-worker-01 ~]$ sudo ostree admin config-diff | grep selinux
M    selinux/targeted/active/commit_num
M    selinux/targeted/active/policy.kern
M    selinux/targeted/active/policy.linked
M    selinux/targeted/policy/policy.33
A    selinux/targeted/active/modules/400
A    selinux/targeted/active/modules/400/virt_launcher
A    selinux/targeted/active/modules/400/virt_launcher/lang_ext
A    selinux/targeted/active/modules/400/virt_launcher/cil
A    selinux/targeted/semanage.read.LOCK
A    selinux/targeted/semanage.trans.LOCK

So the virtualization operator is to blame I guess? At least in my case. I certainly did not fiddle with SELinux myself.

markusdd commented 2 years ago

Yes, that looks very familiar. Fix: restore the policy to coreos Default, run the Upgrade, and then re-apply your fixes.

I heard a new coreos Version will address this, not sure when it will come to okd.

Nevertheless this means the original problem, being non-mergable policy updates, remains for now, which is a bummer.

CorneJB commented 2 years ago

[core@m1 ~]$ sudo ostree admin config-diff | grep selinux
M    selinux/targeted/active/commit_num
M    selinux/targeted/active/policy.linked
M    selinux/targeted/active/policy.kern
M    selinux/targeted/policy/policy.33
A    selinux/final/logreader-test
A    selinux/final/logreader-test/contexts
A    selinux/final/logreader-test/contexts/files
A    selinux/final/logreader-test/policy
A    selinux/targeted/active/modules/400
A    selinux/targeted/active/modules/400/logging-container
A    selinux/targeted/active/modules/400/logging-container/hll
A    selinux/targeted/active/modules/400/logging-container/cil
A    selinux/targeted/active/modules/400/logging-container/lang_ext

Seems to be the same problem! To fix we could use what is specified here?

Steps:

Remove custom modules
Reste coreos default policy for this version sudo rsync -rclv /usr/etc/selinux/ /etc/selinux/
Reapply custom modules

Is this the right way to restore default policy?

markusdd commented 2 years ago

yes it is. That is exactly what we did. Although you do not need to remove anything that is under modules, as it sits there collision free. Upon re-application this once again gets compiled into policy.33 .

As mentioned, this is supposed to be improved by upstream FCOS, if it has happened or not and when this version comes to OKD I don't know.

It seems kind of urgent though, because this is the second time within weeks that this causes downtime for users.

msteenhu commented 2 years ago

Yes, that looks very familiar. Fix: restore the policy to coreos Default, run the Upgrade, and then re-apply your fixes.

Problem is I did not do any fixes myself. I believe the KubeVirt operator did them in my case. I have very little experience with SELinux so I do not yet understand how to fix it. If I read the suggestions correctly I should restore SELinux, then somehow finish the OSTree upgrade and then reinstall the virtualization operator?

markusdd commented 2 years ago

just follow the rsync step above so that M selinux/targeted/policy/policy.33 does not show up anymore when sudo ostree admin config-diff | grep selinux is run. That restores the FCOS-shipped SELinux policy. Then just run semodule -B to compile in your custom modules again (it will just take whatever is under selinux/targeted/active/modules/) and you should be good to go for now. (after that step policy.33 will show as modified again, but that is ok as the baseline is now the up-to-date FCOS policy and kubelet will work)

vrutkovs commented 2 years ago

@sandrobonazzola could you have a look at that? Seems kubevirt related

sandrobonazzola commented 2 years ago

@vrutkovs I'll loop kubevirt people in

fabiand commented 2 years ago

Looping in @xpivarc

fabiand commented 2 years ago

And @acardace and @stu-gott

stu-gott commented 2 years ago

KubeVirt does indeed install a custom policy at runtime. for reference: https://github.com/kubevirt/kubevirt/blob/main/cmd/virt-handler/virt_launcher.cil

However, it's not clear to me how this is related to the kubelet_exec_t error being observed. Can you please clarify how they're connected?

markusdd commented 2 years ago

However, it's not clear to me how this is related to the kubelet_exec_t error being observed. Can you please clarify how they're connected?

Simple: 1) You install a custom policy (as we did in our cluster for other reasons) 2) CoreOS gets updated, but as a custom policy get compiled into the holy system-wide policy.33 file, it is marked as modified and system updates will not override it, so policy starts to rot over time (meaning it gets outdated) 3) the day comes when a policy rule is required to run necessary components (e.g. kubelet), but the updated policy is not there because of 1) and 2) 4) shit hits fan

This is supposed to be fixed or already was fixed by CoreOS is my understanding, but it is not clear to me when it will actually arrive. CoreOS must provide a clean way to allow custom policies without completely deactivating SELinux or going through cumbersome rpm-ostree packing and install procedures.

Does that make sense?

EDIT: To explain my last point clearer: What CoreOC roughly needs to do is when an update happens:

see hwat customs booleans were set
see what custom modules are present
update the policy file to new base version they deliver
re-apply modules and booleans of user and re-run policy compile (semodule -B)

stu-gott commented 2 years ago

Thank you for the clear and colorful explanation. So KubeVirt serves as a source of entropy for the global policy file.

@msteenhu, KubeVirt will install the SELinux module it requires on each worker node as part of its startup procedure. Thus the workaround that you and @markusdd spoke of should indeed work.

markusdd commented 2 years ago

the caviat is that semodule -B is not exactly a quick process. On OS update this is acceptable, on start of a service maybe not.

Also what I wrote above only re-applies custom modules. If any custom sebooleans were set these would be gone. So just overriding the policy file and then recompiling is not something that should be done automatically.

It always boils down to the same issue: We need that upstream fix.

KubeVirt could maybe work around the problem 'correctly' by providing an rpm-ostree package for your module, which would be detected properly.

rhatdan commented 2 years ago

semudule -B should not been needed unless you have run a semodule -DB previously.

Installing a module needs to be done once and does the equivalence of a semodule -B.

markusdd commented 2 years ago

Yeah but as we just explained it does not happen only once if we need to workaround.Am 21.07.2022 11:08 schrieb Daniel J Walsh @.***>: semudule -B should not been needed unless you have run a semodule -DB previously. Installing a module needs to be done once and does the equivalence of a semodule -B.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

msteenhu commented 2 years ago

Sorry for the delay, but I can confirm that the workaround is as simple as executing these 2 commands:

rsync -rclv /usr/etc/selinux/ /etc/selinux/
semodule -B

Does anybody know if the newest versions still suffer from this SELinux/FHCOS bug?

cgwalters commented 2 years ago

Does anybody know if the newest versions still suffer from this SELinux/FHCOS bug?

See https://github.com/coreos/fedora-coreos-tracker/issues/701 for FCOS tracking

The RHCOS/OCP tracker is: https://bugzilla.redhat.com/show_bug.cgi?id=2057497

relyt0925 commented 2 years ago

I also see this when using Openshift Virtualization.

okd-project / okd

KubeVirt on OKD makes SELinux throw "Context system_u:object_r:kubelet_exec_t:s0 is not valid (left unmapped)" #1285