Open mike-nguyen opened 1 year ago
This was addressed by: https://github.com/openshift/os/pull/1275. It only prevents us from shipping with broken groups for openvswitch
We have a new problem now which is is rhel9 (?) builds picked up https://src.fedoraproject.org/rpms/openvswitch/c/a17c9d439da4f7e3bfec0ce4c3b178232d28d3fb?branch=rawhide it sounds like, and I believe today the problem is that use of sysusers.d just clashes with our previously hardcoded bits here https://github.com/openshift/os/blob/master/passwd#L27 and most importantly here: https://github.com/openshift/os/blob/1f2c0eb7e370d2412db15fa28556f419ddf73c5d/group#L45 (note openvswitch has no groups).
Basically we can do one of two things:
/usr
But it doesn't make sense to do both. At this point, we could try dropping the hardcoded user/group files from this repo and rely on sysusers (i.e. per machine state). Or we could hardcode the hugetlbfs group.
Now honestly, I think the real fix here is to move openvswithch to use DynamicUser=yes
and open the hugetlbfs bits as an earlier privileged operation instead of relying on group access.
Or we could hardcode the hugetlbfs group.
@cgwalters note that openvswitch being in hugetlbfs only happens on x86-64 and ARM where we support DPDK. Not on POWER or s390.
@cgwalters I'm curious what's actually clashing here though. sysusers.d(5) says it'll do the things it's asked if the group/user doesn't exist yet:
g
Create a system group of the specified name should it not exist yet. Note that u implicitly creates a matching group. The group will be created with no password set.
m
Add a user to a group. If the user or group do not exist yet, they will be implicitly created.
But we don't get any error logs out of systemd-sysusers about why it's not doing what it's asked... If openvswitch already exists, shouldn't it just ignore /usr/lib/sysusers.d/openvswitch.conf but since hugetlbfs doesn't exist and OVS isn't in it, it should still do all of that?
I did a little investigation on this today.
I tried dropping the hardcoded bits:
diff --git a/group b/group
index e86d91b..1fb1db8 100644
--- a/group
+++ b/group
@@ -42,5 +42,3 @@ nfsnobody:x:65534:
kube:x:994:
sshd:x:74:
chrony:x:992:
-openvswitch:x:800:
-hugetlbfs:x:801:
diff --git a/passwd b/passwd
index 673a3d5..893fd8a 100644
--- a/passwd
+++ b/passwd
@@ -24,4 +24,3 @@ nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
kube:x:996:994:Kubernetes user:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
chrony:x:994:992::/var/lib/chrony:/sbin/nologin
-openvswitch:x:800:800::/:/sbin/nologin
but that didn't work either (the openvswitch
user still doesn't get in the hugetlbfs
group). I cracked open the resulting qemu qcow (i.e. without booting it) and I see the openvswitch
user and group and the hugetlbfs
group in the resulting built image (although with different UID/GID so I know the removal of the hardcoded bits had effect).
I think after we remove the hardcoded bits now what's kicking in is this:
%sysusers_create_compat %{SOURCE2}
%ifarch %{dpdkarches}
%sysusers_create_compat %{SOURCE3}
%endif
where
Source2: openvswitch.sysusers
Source3: openvswitch-hugetlbfs.sysusers
and are defined as:
cat openvswitch.sysusers
#Type Name ID GECOS Home directory Shell
u openvswitch - "Open vSwitch Daemons" / /sbin/nologin
cat openvswitch-hugetlbfs.sysusers
#Type Name ID GECOS Home directory Shell
m openvswitch hugetlbfs
So that looks normal.. but what is sysusers_create_compat
? It's just a macro that calls a bash script. So I imagine some of that logic inside that bash script (i.e. maybe it works during rpm install for dnf, but not rpm-ostree?) is why we end up with openvswitch with no group.
I will note that if on a running instance I remove openvswitch
and hugetlbfs
entries and rerun SYSTEMD_LOG_LEVEL=debug systemd-sysusers
it does create things appropriately.
WDYT of https://github.com/openshift/os/pull/1317 at least to get us unblocked for now?
At this point we can either merge #1317, which drops the hardcoded bits user/group assignments, or #1318, which just works around the RPM scriptlet not working.
IIUC merging #1317 will cause systems that upgrade to have a different (new) UID/GID for the openvswitch
user/group and hugetlbfs
group on the next reboot. I'm not sure if this is OK or not.
It looks like https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/sysusers.generate-pre.sh and https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 have diverged, and the latter is missing multiple updates to the script.
Specifically, this change that is missing in RHEL looks like the culprit to me: https://src.fedoraproject.org/rpms/systemd/c/f27d461663bec17ad64422682f260f0020ccc7f7?branch=rawhide
This still seems to be a problem on RHCOS 4.14-9.2.
The sysusers.d
configurations created by rpmostree now clash with the ones provided by the OS packages for openvswitch and unbound:
This likely also causes https://github.com/openshift/installer/issues/7265.
It looks like https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79 hasn't been backported to https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 yet.
Messy. Yes, ultimately we need one "source of truth" for users - what these packages are doing in invoking both useradd
at %post
time and installing a sysusers file is creating two.
Backporting this logic to RHEL would help indeed.
But, we probably also need to change rpm-ostree to detect this case. Looks like we already have https://github.com/coreos/rpm-ostree/issues/2728
It looks like https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79 hasn't been backported to https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 yet.
Ah that makes sense. As a lowly customer I don't have access to devel.redhat.com. I'll keep an eye on the Bugzilla bug, thanks!
Yes, ultimately we need one "source of truth" for users.
Agreed, there doesn't seem to be a common pattern in dealing with these system-specific package configurations. I'm don't have the deepest insight into how rpmostree manages systemd configuration, I'm just confused as to why it layers identical configuration on top of what the packages already provide.
Ahhhh OK something was really confusing me - we're not actually seeing this in OCP. It's because https://github.com/coreos/rpm-ostree/blob/7153ab558ac813b963a55abb5f4892fcd2f9ceca/src/libpriv/rpmostree-container.cxx#L50 and OKD/SCOS is using container layering to build the node image (which is great).
So a workaround today is for the OKD/SCOS build to basically remove the duplicate rpm-ostree sysusers.d entries as part of the container build. (But we should still fix this in rpm-ostree for sure)
However, https://github.com/coreos/rpm-ostree/issues/4505 would also address this and have other benefits.
@cgwalters I was able to find some time again to look at this more closely:
It turns out that in my case openshift/installer pulls the following RHCOS image on bin/openshift-install create cluster
:
On the master VM, rpm-ostree - as instructed by bootstrap-pivot.sh
- then pulls an FCOS image and layers this on top of RHCOS, which causes the systemd-sysusers
issue I've mentioned:
So I'm actually trying to run OCP, but the installer/bootstrap has other things in mind and pivots to an OKD image. I'm frankly surprised more things didn't break since this really shouldn't happen. :D
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
/remove-lifecycle rotten /lifecycle frozen
openvswitch %pre scriptlet adds the
openvswitch
user to thehugetblfs
group. Since %pre runs withoutset -e
by default the failures are ignored resulting in worker nodes that do not come online during a cluster install.These errors are showing up during the rpm-ostree compose: