rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.51k stars 264 forks source link

Overriding data-dir not supported on SELinux? #6520

Closed aceeric closed 1 month ago

aceeric commented 1 month ago

Environmental Info: RKE2 Version: v1.28.12+rke2r1

Node(s) CPU architecture, OS, and Version:

Command:

uname -r

Output:

4.18.0-553.8.1.el8_10.x86_64

Command:

cat /etc/os-release

Ouput:

NAME="Red Hat Enterprise Linux"
VERSION="8.10 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"

Command:

lsblk

Output:


NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0  30G  0 disk
├─nvme0n1p1 259:1    0   1M  0 part
└─nvme0n1p2 259:2    0  30G  0 part /

Command:

sestatus

Output:

SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

Command:

getenforce

Output:

Enforcing

Cluster Configuration: Single node.

Describe the bug: It looks like specifying a different data-dir is not supported on SELinux.

Steps To Reproduce:

  1. Create a single EC2 instance with SELinux enforcing
  2. Configure RKE2 pre-reqs
  3. Yum install RKE2
  4. Configure the data-dir
  5. Start RKE2
  6. Observe that RKE2 never starts
  7. Repeat the process with SELinux disabled

1. Create a single EC2 instance with SELinux enforcing I don't provide details here since there are so many ways to do it.

2. Configure RKE2 pre-reqs

Create a script with these contents and run it:

#!/usr/bin/env bash

mkdir -p /etc/NetworkManager/conf.d
cat <<EOF >| /etc/NetworkManager/conf.d/rke2-canal.conf
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:flannel*
EOF

systemctl reload NetworkManager 

yum -y install socat conntrack ipset

[[ -z "$(swapon --show)" ]] || { swapoff -a; sed -i '/ swap /d' /etc/fstab; }

cat <<EOF >| /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

cat <<EOF >| /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF

sysctl --system

systemctl disable --now nm-cloud-setup.service nm-cloud-setup.timer

3. Yum install RKE2

curl -sfL https://get.rke2.io | sh -

4. Configure the data-dir

cat <<EOF >| /etc/rancher/rke2/config.yaml
data-dir: /var/frobozz
EOF

5. Start RKE2

systemctl enable --now rke2-server &

6. Observe that RKE2 never starts

journalctl -u rke2-server -f Seems the get stuck at:

Aug 08 16:05:01 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:01Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 08 16:05:06 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:06Z" level=info msg="Waiting for etcd server to become available"
Aug 08 16:05:06 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:06Z" level=info msg="Waiting for API server to become available"
Aug 08 16:05:10 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:05:10.993863Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:05:10 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:10Z" level=info msg="Failed to test data store connection: context deadline exceeded"
Aug 08 16:05:16 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:05:16.557958Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:05:16 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:16Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 08 16:05:31 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:05:31.559621Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:05:31 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:31Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 08 16:05:36 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:36Z" level=info msg="Waiting for API server to become available"
Aug 08 16:05:36 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:36Z" level=info msg="Waiting for etcd server to become available"
Aug 08 16:05:45 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:05:45.995Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:05:45 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:45Z" level=info msg="Failed to test data store connection: context deadline exceeded"
Aug 08 16:05:46 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:05:46.560518Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:05:46 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:05:46Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 08 16:06:01 ip-10-104-22-203.evoforge.org rke2[3392]: {"level":"warn","ts":"2024-08-08T16:06:01.560741Z","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012d0960/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 08 16:06:01 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:06:01Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 08 16:06:06 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:06:06Z" level=info msg="Waiting for API server to become available"
Aug 08 16:06:06 ip-10-104-22-203.evoforge.org rke2[3392]: time="2024-08-08T16:06:06Z" level=info msg="Waiting for etcd server to become available"

7. Repeat the process with SELinux disabled

All steps are the same except:

cat <<EOF >| /etc/rancher/rke2/config.yaml
selinux: false
data-dir: /var/frobozz
EOF

Observe that RKE2 does start up. Verify:

/var/frobozz/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes

Output:

NAME        STATUS   ROLES                       AGE   VERSION
zzzzzzzzz   Ready    control-plane,etcd,master   58s   v1.28.12+rke2r1

Expected behavior: Cluster starts.

Actual behavior: Cluster does not start.

dereknola commented 1 month ago

Our selinux policies are designed to work with RKE2 as is. You would need to right your own policies to handle a custom data-dir.

aceeric commented 1 month ago

I suspected that would be the case - I looked on the various documentation sites and didn't see any direct mention of that. Perhaps consider adding some verbiage to https://docs.rke2.io/reference/server_config.

Thank you.