Closed clemenko closed 1 year ago
I'm seeing this same issue. It's happened on multiple OS's including on Rocky Linux and RHEL. Will stick to the work around until the upstream issues are released.
Also for the workaround, please ensure you don’t run yum update.
I have released new rpms for rke2-selinux, rke2-server, rke2-agent, rke2-common in the testing channel, it seems to be working fine with el9 distros and the recent changes to container-selinux
confirmed work. Will the install.sh
pull it? How soon until the QA is complete and the rpm is published?
+1 Running into this across all latest AMI's for Rocky 8, RHEL 8 on AWS and GCP
+1 running into this on RHEL8 AMI on AWS, also can confirm running testing channel seems to fix the issue.
Testing has concluded and details provided in https://github.com/rancher/rke2/issues/4285#issuecomment-1562086153 and https://github.com/rancher/rke2-selinux/issues/33#issuecomment-1562089891.
This should be fixed now via the install script at https://get.rke2.io and the testing
channel. latest
and stable
channels will get the fix in line with the May patch releases (very soon).
When will we see the rke2-selinux package updated?
Also :
[root@rke1 ~]# curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=v1.24 sh -
[INFO] using stable RPM repositories
[INFO] using 1.24 series from channel stable
Rancher RKE2 Common (v1.24) 666 B/s | 389 B 00:00
Errors during downloading metadata for repository 'rancher-rke2-common-stable':
- Status code: 404 for https://rpm.rancher.io/rke2/stable/common/centos/9/noarch/repodata/repomd.xml (IP: 172.67.129.95)
Error: Failed to download metadata for repo 'rancher-rke2-common-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
[root@rke1 ~]#
Did this break too?
^^ I'm curious as well! We definitely need a seamless experience for customers. selinux is a hard requirements across the board.
ah we went ahead and reverted the install script changes for now. We'll update it all at the same time. Currently it's only updated in the testing
channel, so we'll have to wait to update everything as we release these May patches (later this week or early next).
Can we re-open this issue until the rke2-selinux
package is fixed?
I just tested with the latest testing.9 release and I am seeing issues. RKE2 starts. But zero longhorn pvcs are able to attach.
May 26 14:40:48 localhost setroubleshoot[34188]: SELinux is preventing /usr/sbin/iscsiadm from using the dac_override capability.#012#012***** Plugin dac_override (91.4 confidence) suggests **********************#012#012If you want to help identify if domain needs this access or you have a file with the wrong permissions on your system#012Then turn on full auditing to get path information about the offending file and generate the error again.#012Do#012#012Turn on full auditing#012# auditctl -w /etc/shadow -p w#012Try to recreate AVC. Then execute#012# ausearch -m avc -ts recent#012If you see PATH record check ownership/permissions on file, and fix it,#012otherwise report as a bugzilla.#012#012***** Plugin catchall (9.59 confidence) suggests **************************#012#012If you believe that iscsiadm should have the dac_override capability by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'iscsiadm' --raw | audit2allow -M my-iscsiadm#012# semodule -X 300 -i my-iscsiadm.pp#012
results from ausearch -m avc -ts recent
time->Fri May 26 14:45:50 2023
type=PROCTITLE msg=audit(1685112350.374:4306): proctitle=697363736961646D002D6D00646973636F76657279002D740073656E6474617267657473002D700031302E34322E302E38
type=PATH msg=audit(1685112350.374:4306): item=1 name="/var/lib/iscsi/nodes/iqn.2019-10.io.longhorn:pvc-7ec0aeb4-5d3a-4996-be4c-1bb9cab001a4/10.42.0.8,3260,1" nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1685112350.374:4306): item=0 name="/var/lib/iscsi/nodes/iqn.2019-10.io.longhorn:pvc-7ec0aeb4-5d3a-4996-be4c-1bb9cab001a4/" inode=100878733 dev=fc:01 mode=040600 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:iscsi_var_lib_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1685112350.374:4306): cwd="/"
type=SYSCALL msg=audit(1685112350.374:4306): arch=c000003e syscall=83 success=no exit=-13 a0=557b96f3e690 a1=1f8 a2=ffffffffffffff00 a3=0 items=2 ppid=132762 pid=134791 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)
type=AVC msg=audit(1685112350.374:4306): avc: denied { dac_override } for pid=134791 comm="iscsiadm" capability=1 scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0
@clemenko As far as I can see, we have no specific policies for ISCSI or any related longhorn policies in our policies, nor I can see any specific ISCSI related policies in container-selinux as well
then why does it work with the older container-selinux
package? The two issues are related. It has to do with how the containers are tagged.
Are we able to re-open this issue until all issues around it are resolved? If we don't have a complete fix, especially with supported and prospect customers still reaching out to us about it, it should not be closed. @galal-hussein
Our general practice is to close issues after fixes have been validated, but before they are fully available in all channels. This will be available with the upcoming releases, which will be out and in all channels later this week.
FYI this appears to still be broken.
+1 to Andy's comment. Tested with RKE2 v1.24.14 and Longhorn still failing due to selinux.
@galal-hussein I'm also still experiencing this issue.
With a clean Rocky Linux 9.2 VM, attempting to install RKE2 using the install script fails due to ETCD failing to start.
Works with Rocky 8.8.
rke2-server: 1.25.10~rke2r1
rke2-selinux: 0.13-1.el9
container-selinux: 3:2.205.0-1.el9_2
Hmmm what are your settings? I see that same scenario working:
$ uname -a
Linux ip-172-31-42-128.us-east-2.compute.internal 5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 17:09:15 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.2 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.2 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.2"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
$ rpm -qa container-selinux rke2-server rke2-selinux
container-selinux-2.205.0-1.el9_2.noarch
rke2-selinux-0.13-1.el9.noarch
rke2-server-1.25.10~rke2r1-0.el9.x86_64
$ kubectl get nodes,pods -A -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/ip-xxx-xx-xx-xxx.us-east-2.compute.internal Ready control-plane,etcd,master 9m43s v1.25.10+rke2r1 xxx.xx.xx.xxx <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.1-k3s1
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/cloud-controller-manager-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m32s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/etcd-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-canal-2p4t5 0/1 Completed 0 9m13s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-coredns-7b9zv 0/1 Completed 0 9m13s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-ingress-nginx-4bzh2 0/1 Completed 0 9m13s 10.42.0.2 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-metrics-server-8t4gm 0/1 Completed 0 9m13s 10.42.0.7 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-snapshot-controller-crd-gjxg6 0/1 Completed 0 9m13s 10.42.0.4 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-snapshot-controller-l5gnt 0/1 Completed 1 9m13s 10.42.0.6 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/helm-install-rke2-snapshot-validation-webhook-rb2p7 0/1 Completed 0 9m13s 10.42.0.3 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/kube-apiserver-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m40s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/kube-controller-manager-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m34s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/kube-proxy-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m28s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/kube-scheduler-ip-xxx-xx-xx-xxx.us-east-2.compute.internal 1/1 Running 0 9m34s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-canal-gdnq7 2/2 Running 0 9m4s xxx.xx.xx.xxx ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-coredns-rke2-coredns-6b9548f79f-hfcrx 1/1 Running 0 9m5s 10.42.0.5 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-coredns-rke2-coredns-autoscaler-57647bc7cf-229f7 1/1 Running 0 9m5s 10.42.0.8 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-ingress-nginx-controller-wgblh 1/1 Running 0 8m3s 10.42.0.13 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-metrics-server-78b84fff48-j827c 1/1 Running 0 8m22s 10.42.0.9 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-snapshot-controller-849d69c748-8gnst 1/1 Running 0 8m9s 10.42.0.12 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
kube-system pod/rke2-snapshot-validation-webhook-654f6677b-n95tx 1/1 Running 0 8m19s 10.42.0.11 ip-xxx-xx-xx-xxx.us-east-2.compute.internal <none> <none>
@rancher-max Thanks for the reply. Turns out this was a missing firewalld rule (port 2380/tcp
for ETCD). After adding the rule, the Rocky 9.2 node joined the existing cluster.
@p-kimberley Glad to hear it. The most recent release on 30 May 2023 fixed this issue.
Thanks to an update with https://rockylinux.pkgs.org/9/rockylinux-appstream-x86_64/container-selinux-2.205.0-1.el9_2.noarch.rpm.html it appears to be causing etcd to not start on a new Rocky box. There currently is no way to tell it to load an older version of container-selinux to test.
Installed with
and the error :
It looks like rke2-selinux needs to be updated for the up stream changes.
The work around is it
before the
install.sh
script.