Open izaac opened 2 years ago
I've made it work with manual intervention by installing the rke2-selinux RPM and disable the network manager services preventing the rke2-server.service
to start.
The selinux RPM install, https://github.com/rancher/rke2-selinux
sudo systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
All this manual work shouldn't happen and Rancher/RKE2 should take care of it automatically.
I believe @Oats87 is working on selinux support for Rancher-provisioned clusters by allowing for local install of the RPMs instead of the tarball; it was not supported in the initial tech preview.
The nm-cloud-setup issue is interesting; we have taken the position that RKE2 shouldn't enable or disable other system services (and certainly shouldn't reboot the host, as required to disable nm-cloud-setup), and left it up to the administrator to read the documentation: https://docs.rke2.io/known_issues/#networkmanager
Perhaps rancher-system-agent can get away with being more hands-on with the system configuration.
@izaac , would the https://github.com/rancher/rancher/issues/36509#issuecomment-1055621824 apply here as well? I know it's far from ideal, but installing RPMs and preparing nodes in a more automated fashion sounds like a not-so-small feature - and may not be feasible to support for all permutations (so we should probably focus supporting standard images first).
@snasovich totally, documenting it is an option I can review the docs once are ready for review, it has to be really visible and clear IMO. and so we can close this issue that case.
Thanks for following up
This will need to be release noted and probably even added to support matrix.
@izaac , is there already an AMI that has necessary changes applied?
@snasovich correct I did the testing with a private AMI with the requirements here.
That made the cluster provisioning work when making it from the Rancher UI.
@izaac , I was wondering if you could create an AMI that is based on RHEL 8.5 Golden public AMI + minimal changes needed for provisioning to work without manual intervention? We could then reference this AMI in documentation / release notes / support matrix.
@snasovich let me see if we can do that from the QA group im not sure if we have rights to publish public AMIs, I'll investigate
We may want to improve this for 2.6.5 release, so moving to that milestone and removed release-note
. If it's still working this way by 2.6.5, it will need to be release noted.
We need to come up with an approach to address this and similar issues where additional packages are needed for these images.
After discussion with @Oats87 @thedadams this will need to be release noted / mentioned on support matrix for 2.6.5 as the lift will be too big to start installing RPMs as part of provisioning. Moving to Blocked for now.
Rancher Server Setup
v1.3.8-rc2
k8s v1.22.6Information about the Cluster
ami-005074b2b824595f4
onus-east-2
User Information
Describe the bug
Cluster stays in Provisioning state, and never comes Active
To Reproduce
ec2-user
open-all
gp2
(which is the default) andgp3
.Result
Cluster stuck in
Provisioning
state. Nodes show message (from yaml)message: 'provisioning bootstrap node(s) izb4-e-bbb88bb84-8rcdd: waiting for probes:
Events from local cluster show
FailedMount
eventsMountVolume.SetUp failed for volume "bootstrap" : object "fleet-default"/"izb4-bootstrap-template-4lgfk-machine-bootstrap" not registered
MountVolume.SetUp failed for volume "machine-files" : object "fleet-default"/"izb4-w-328c4230-bpwfj-machine-provision" not registered
MountVolume.SetUp failed for volume "kube-api-access-tjskd" : object "fleet-default"/"kube-root-ca.crt" not registered
MountVolume.SetUp failed for volume "tls-rancher-volume" : object "fleet-default"/"tls-rancher" not registered
And others
FailedMount
Expected Result Be able to provision a Cluster using the RHEL 8.5 Golden public AMI from Rancher
Screenshots
Additional context This is the original AMI, we have private AMIs with docker installed and networking services configuration and the cluster provisioning works. This same Image works when provisioning RKE1 clusters.