rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.56k stars 268 forks source link

RKE2 installation with Cilium on RHEL9 #5188

Closed ObieBent closed 10 months ago

ObieBent commented 10 months ago

Environmental Info: RKE2 Version: v1.26.5+rke2r1

Node(s) CPU architecture, OS, and Version: # uname -a Linux dwst-cp-1 5.14.0-162.23.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Mar 23 20:08:28 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.1 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.1"


- SELinux is activated 

getenforce

Enforcing

rpm -qav | grep selinux

libselinux-3.4-3.el9.x86_64 libselinux-utils-3.4-3.el9.x86_64 rpm-plugin-selinux-4.16.1.3-19.el9_1.x86_64 selinux-policy-34.1.43-1.el9_1.2.noarch selinux-policy-targeted-34.1.43-1.el9_1.2.noarch python3-libselinux-3.4-3.el9.x86_64 container-selinux-2.189.0-1.el9.noarch rke2-selinux-0.16-1.el8.noarch


- Add-on config file

MTU: 0 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:


Cluster Configuration:
1 server and 2 agents

**Describe the bug:**
I'm trying to setup a downstream cluster RKE2 with Cilium on RHEL9.
I run the curl command provided from Rancher UI on the first server dedicated to hosting control plane services, but it seems like the rancher-system-agent
doesn't perform anything. It stays at this stage forever. 

**Steps To Reproduce:**

- Registration of the  server 

curl -fL https://rms.buzz.lab/system-agent-install.sh | sudo sh -s - --server https://rms.buzz.lab --label 'cattle.io/os=linux' --token vpwx4vjrtgk9qc9cvrv2tb8xph7dfttcpqqwvnxnjbdq8q2mn2s5kf --ca-checksum e25afd5076b49a2a5bac4f2669999b33779b1de85d6e1c7c77e81f975a1112db --etcd --controlplane

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 30889 0 30889 0 0 257k 0 --:--:-- --:--:-- --:--:-- 257k [INFO] Label: cattle.io/os=linux [INFO] Role requested: etcd [INFO] Role requested: controlplane [INFO] Using default agent configuration directory /etc/rancher/agent [INFO] Using default agent var directory /var/lib/rancher/agent [INFO] Determined CA is not necessary to connect to Rancher [INFO] Successfully tested Rancher connection [INFO] Downloading rancher-system-agent binary from https://rms.buzz.lab/assets/rancher-system-agent-amd64 [INFO] Successfully downloaded the rancher-system-agent binary. [INFO] Downloading rancher-system-agent-uninstall.sh script from https://rms.buzz.lab/assets/system-agent-uninstall.sh [INFO] Successfully downloaded the rancher-system-agent-uninstall.sh script. [INFO] Generating Cattle ID [INFO] Successfully downloaded Rancher connection information [INFO] systemd: Creating service file [INFO] Creating environment file /etc/systemd/system/rancher-system-agent.env [INFO] Enabling rancher-system-agent.service Created symlink /etc/systemd/system/multi-user.target.wants/rancher-system-agent.service → /etc/systemd/system/rancher-system-agent.service. [INFO] Starting/restarting rancher-system-agent.service


- Fails to start the rke2-server through the rancher-system-agent service 

journalctl -u rancher-system-agent --no-pager

Jan 03 03:41:34 dwst-cp-1 systemd[1]: Started Rancher System Agent. Jan 03 03:41:34 dwst-cp-1 rancher-system-agent[12207]: time="2024-01-03T03:41:34+01:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting" Jan 03 03:41:34 dwst-cp-1 rancher-system-agent[12207]: time="2024-01-03T03:41:34+01:00" level=info msg="Using directory /var/lib/rancher/agent/work for work" Jan 03 03:41:34 dwst-cp-1 rancher-system-agent[12207]: time="2024-01-03T03:41:34+01:00" level=info msg="Starting remote watch of plans" Jan 03 03:41:35 dwst-cp-1 rancher-system-agent[12207]: E0103 03:41:35.164424 12207 memcache.go:206] couldn't get resource list for management.cattle.io/v3: Jan 03 03:41:35 dwst-cp-1 rancher-system-agent[12207]: time="2024-01-03T03:41:35+01:00" level=info msg="Starting /v1, Kind=Secret controller"



**Expected behavior:**
It should install correctly.

**Actual behavior:**
Fails to deploy the rke2-server process

**Additional context / logs:**
brandond commented 10 months ago

I don't believe this is an RKE2 issue. If the rancher system agent doesn't attempt to install and start rke2, then the issue is on the Rancher side. Check the logs over there.

ObieBent commented 10 months ago

Thanks for your reply. After perusing the entire server system logging and performing some strace, I conclude that the rancher system agent doesn't invoke rke2. I have submitted an issue for this on Rancher Github Issues page. I'll let you know in case of any updates.

caroline-suse-rancher commented 10 months ago

I'm converting this to a discussion to keep open for updates