openshift / os

89 stars 107 forks source link

More openvswitch woes #1274

Open mike-nguyen opened 1 year ago

mike-nguyen commented 1 year ago

openvswitch %pre scriptlet adds the openvswitch user to the hugetblfs group. Since %pre runs without set -e by default the failures are ignored resulting in worker nodes that do not come online during a cluster install.

These errors are showing up during the rpm-ostree compose:

14:30:05  openvswitch3.1.prein: usermod.rpmostreesave: /etc/passwd.6: lock file already used
14:30:05  openvswitch3.1.prein: usermod.rpmostreesave: cannot lock /etc/passwd; try again later.
mike-nguyen commented 1 year ago

This was addressed by: https://github.com/openshift/os/pull/1275. It only prevents us from shipping with broken groups for openvswitch

cgwalters commented 1 year ago

We have a new problem now which is is rhel9 (?) builds picked up https://src.fedoraproject.org/rpms/openvswitch/c/a17c9d439da4f7e3bfec0ce4c3b178232d28d3fb?branch=rawhide it sounds like, and I believe today the problem is that use of sysusers.d just clashes with our previously hardcoded bits here https://github.com/openshift/os/blob/master/passwd#L27 and most importantly here: https://github.com/openshift/os/blob/1f2c0eb7e370d2412db15fa28556f419ddf73c5d/group#L45 (note openvswitch has no groups).

Basically we can do one of two things:

But it doesn't make sense to do both. At this point, we could try dropping the hardcoded user/group files from this repo and rely on sysusers (i.e. per machine state). Or we could hardcode the hugetlbfs group.

Now honestly, I think the real fix here is to move openvswithch to use DynamicUser=yes and open the hugetlbfs bits as an earlier privileged operation instead of relying on group access.

dcbw commented 1 year ago

Or we could hardcode the hugetlbfs group.

@cgwalters note that openvswitch being in hugetlbfs only happens on x86-64 and ARM where we support DPDK. Not on POWER or s390.

dcbw commented 1 year ago

@cgwalters I'm curious what's actually clashing here though. sysusers.d(5) says it'll do the things it's asked if the group/user doesn't exist yet:

       g
           Create a system group of the specified name should it not exist yet. Note that u implicitly creates a matching group. The group will be created with no password set.

       m
           Add a user to a group. If the user or group do not exist yet, they will be implicitly created.

But we don't get any error logs out of systemd-sysusers about why it's not doing what it's asked... If openvswitch already exists, shouldn't it just ignore /usr/lib/sysusers.d/openvswitch.conf but since hugetlbfs doesn't exist and OVS isn't in it, it should still do all of that?

dustymabe commented 1 year ago

I did a little investigation on this today.

I tried dropping the hardcoded bits:

diff --git a/group b/group
index e86d91b..1fb1db8 100644
--- a/group
+++ b/group
@@ -42,5 +42,3 @@ nfsnobody:x:65534:
 kube:x:994:
 sshd:x:74:
 chrony:x:992:
-openvswitch:x:800:
-hugetlbfs:x:801:
diff --git a/passwd b/passwd
index 673a3d5..893fd8a 100644
--- a/passwd
+++ b/passwd
@@ -24,4 +24,3 @@ nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
 kube:x:996:994:Kubernetes user:/:/sbin/nologin
 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
 chrony:x:994:992::/var/lib/chrony:/sbin/nologin
-openvswitch:x:800:800::/:/sbin/nologin

but that didn't work either (the openvswitch user still doesn't get in the hugetlbfs group). I cracked open the resulting qemu qcow (i.e. without booting it) and I see the openvswitch user and group and the hugetlbfs group in the resulting built image (although with different UID/GID so I know the removal of the hardcoded bits had effect).

I think after we remove the hardcoded bits now what's kicking in is this:

%sysusers_create_compat %{SOURCE2}
%ifarch %{dpdkarches}
%sysusers_create_compat %{SOURCE3}
%endif

where

Source2: openvswitch.sysusers
Source3: openvswitch-hugetlbfs.sysusers

and are defined as:

cat openvswitch.sysusers
#Type Name         ID         GECOS                   Home directory  Shell
u     openvswitch  -          "Open vSwitch Daemons"  /               /sbin/nologin

cat openvswitch-hugetlbfs.sysusers
#Type Name         ID         GECOS                   Home directory  Shell
m     openvswitch  hugetlbfs

So that looks normal.. but what is sysusers_create_compat? It's just a macro that calls a bash script. So I imagine some of that logic inside that bash script (i.e. maybe it works during rpm install for dnf, but not rpm-ostree?) is why we end up with openvswitch with no group.

I will note that if on a running instance I remove openvswitch and hugetlbfs entries and rerun SYSTEMD_LOG_LEVEL=debug systemd-sysusers it does create things appropriately.

dustymabe commented 1 year ago

WDYT of https://github.com/openshift/os/pull/1317 at least to get us unblocked for now?

dustymabe commented 1 year ago

At this point we can either merge #1317, which drops the hardcoded bits user/group assignments, or #1318, which just works around the RPM scriptlet not working.

IIUC merging #1317 will cause systems that upgrade to have a different (new) UID/GID for the openvswitch user/group and hugetlbfs group on the next reboot. I'm not sure if this is OK or not.

LorbusChris commented 1 year ago

It looks like https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/sysusers.generate-pre.sh and https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 have diverged, and the latter is missing multiple updates to the script.

Specifically, this change that is missing in RHEL looks like the culprit to me: https://src.fedoraproject.org/rpms/systemd/c/f27d461663bec17ad64422682f260f0020ccc7f7?branch=rawhide

LorbusChris commented 1 year ago

https://bugzilla.redhat.com/show_bug.cgi?id=2217149 https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79

knthm commented 1 year ago

This still seems to be a problem on RHCOS 4.14-9.2.

The sysusers.d configurations created by rpmostree now clash with the ones provided by the OS packages for openvswitch and unbound:

systemd-sysusers logs ``` [core@clust-6hwr5-master-0 sysusers.d]$ systemctl status systemd-sysusers × systemd-sysusers.service - Create System Users Loaded: loaded (/usr/lib/systemd/system/systemd-sysusers.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf Active: failed (Result: exit-code) since Wed 2023-08-30 13:25:16 UTC; 12min ago Duration: 1.774s Docs: man:sysusers.d(5) man:systemd-sysusers.service(8) Process: 736 ExecStart=systemd-sysusers (code=exited, status=1/FAILURE) Main PID: 736 (code=exited, status=1/FAILURE) CPU: 19ms Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: /usr/lib/sysusers.d/systemd-timesync.conf:8: Conflict with earlier configuration for user 'systemd-ti> Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'hugetlbfs' with GID 978. Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'openvswitch' with GID 977. Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'unbound' with GID 976. Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating user 'openvswitch' (Open vSwitch Daemons) with UID 977 and GID 977. Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating user 'unbound' (Unbound DNS resolver) with UID 976 and GID 976. Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: /etc/gshadow: Group "unbound" already exists. Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: systemd-sysusers.service: Main process exited, code=exited, status=1/FAILURE Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: systemd-sysusers.service: Failed with result 'exit-code'. Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: Failed to start systemd-sysusers.service - Create System Users. ```
/var/lib/sysusers.d contents ```sh $ ll /usr/lib/sysusers.d/ total 108 -rw-r--r--. 2 root root 291 Jan 1 1970 00-coreos-nobody.conf -rw-r--r--. 2 root root 1317 Jan 1 1970 00-coreos-static.conf -rw-r--r--. 2 root root 984 Jan 1 1970 10-static-extra.conf -rw-r--r--. 2 root root 240 Jan 1 1970 20-setup-groups.conf -rw-r--r--. 2 root root 457 Jan 1 1970 20-setup-users.conf -rw-r--r--. 2 root root 40 Jan 1 1970 30-rpmostree-pkg-group-hugetlbfs.conf -rw-r--r--. 2 root root 42 Jan 1 1970 30-rpmostree-pkg-group-openvswitch.conf -rw-r--r--. 2 root root 38 Jan 1 1970 30-rpmostree-pkg-group-unbound.conf -rw-r--r--. 2 root root 81 Jan 1 1970 35-rpmostree-pkg-user-openvswitch.conf -rw-r--r--. 2 root root 88 Jan 1 1970 35-rpmostree-pkg-user-unbound.conf -rw-r--r--. 2 root root 50 Jan 1 1970 40-rpmostree-pkg-usermod-openvswitch-hugetlbfs.conf -rw-r--r--. 2 root root 359 Jan 1 1970 README -rw-r--r--. 2 root root 1299 Jan 1 1970 basic.conf -rw-r--r--. 3 root root 132 Jan 1 1970 chrony.conf -rw-r--r--. 2 root root 79 Jan 1 1970 clevis.conf -rw-r--r--. 3 root root 118 Jan 1 1970 dbus.conf -rw-r--r--. 3 root root 59 Jan 1 1970 dnsmasq.conf -rw-r--r--. 2 root root 134 Jan 1 1970 openssh-server.conf -rw-r--r--. 2 root root 189 Jan 1 1970 openvswitch.conf -rw-r--r--. 3 root root 39 Jan 1 1970 samba.conf -rw-r--r--. 3 root root 335 Jan 1 1970 systemd-coredump.conf -rw-r--r--. 2 root root 316 Jan 1 1970 systemd-journal.conf -rw-r--r--. 3 root root 339 Jan 1 1970 systemd-oom.conf -rw-r--r--. 2 root root 333 Jan 1 1970 systemd-resolve.conf -rw-r--r--. 2 root root 344 Jan 1 1970 systemd-timesync.conf -rw-r--r--. 2 root root 128 Jan 1 1970 tpm2-tss.conf -rw-r--r--. 2 root root 66 Jan 1 1970 unbound.sysusers ```

This likely also causes https://github.com/openshift/installer/issues/7265.

LorbusChris commented 1 year ago

It looks like https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79 hasn't been backported to https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 yet.

That's https://bugzilla.redhat.com/show_bug.cgi?id=2217149

cgwalters commented 1 year ago

Messy. Yes, ultimately we need one "source of truth" for users - what these packages are doing in invoking both useradd at %post time and installing a sysusers file is creating two.

Backporting this logic to RHEL would help indeed.

But, we probably also need to change rpm-ostree to detect this case. Looks like we already have https://github.com/coreos/rpm-ostree/issues/2728

knthm commented 1 year ago

It looks like https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79 hasn't been backported to https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 yet.

Ah that makes sense. As a lowly customer I don't have access to devel.redhat.com. I'll keep an eye on the Bugzilla bug, thanks!

Yes, ultimately we need one "source of truth" for users.

Agreed, there doesn't seem to be a common pattern in dealing with these system-specific package configurations. I'm don't have the deepest insight into how rpmostree manages systemd configuration, I'm just confused as to why it layers identical configuration on top of what the packages already provide.

cgwalters commented 1 year ago

Ahhhh OK something was really confusing me - we're not actually seeing this in OCP. It's because https://github.com/coreos/rpm-ostree/blob/7153ab558ac813b963a55abb5f4892fcd2f9ceca/src/libpriv/rpmostree-container.cxx#L50 and OKD/SCOS is using container layering to build the node image (which is great).

So a workaround today is for the OKD/SCOS build to basically remove the duplicate rpm-ostree sysusers.d entries as part of the container build. (But we should still fix this in rpm-ostree for sure)

However, https://github.com/coreos/rpm-ostree/issues/4505 would also address this and have other benefits.

knthm commented 1 year ago

@cgwalters I was able to find some time again to look at this more closely:

It turns out that in my case openshift/installer pulls the following RHCOS image on bin/openshift-install create cluster:

Initial bootstrap/master os-release ```cfg NAME="Red Hat Enterprise Linux CoreOS" ID="rhcos" ID_LIKE="rhel fedora" VERSION="414.92.202308032115-0" VERSION_ID="4.14" VARIANT="CoreOS" VARIANT_ID=coreos PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 414.92.202308032115-0 (Plow)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.14/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.14" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.14" OPENSHIFT_VERSION="4.14" RHEL_VERSION="9.2" OSTREE_VERSION="414.92.202308032115-0" ```

On the master VM, rpm-ostree - as instructed by bootstrap-pivot.sh - then pulls an FCOS image and layers this on top of RHCOS, which causes the systemd-sysusers issue I've mentioned:

Pivoted master os-release ```cfg NAME="Fedora Linux" VERSION="38.20230907.20.0 (CoreOS)" ID=fedora VERSION_ID=38 VERSION_CODENAME="" PLATFORM_ID="platform:f38" PRETTY_NAME="Fedora CoreOS 38.20230907.20.0" ANSI_COLOR="0;38;2;60;110;180" LOGO=fedora-logo-icon CPE_NAME="cpe:/o:fedoraproject:fedora:38" HOME_URL="https://getfedora.org/coreos/" DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/" SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/" BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/" REDHAT_BUGZILLA_PRODUCT="Fedora" REDHAT_BUGZILLA_PRODUCT_VERSION=38 REDHAT_SUPPORT_PRODUCT="Fedora" REDHAT_SUPPORT_PRODUCT_VERSION=38 SUPPORT_END=2024-05-14 VARIANT="CoreOS" VARIANT_ID=coreos OSTREE_VERSION='38.20230907.20.0' ```

So I'm actually trying to run OCP, but the installer/bootstrap has other things in mind and pivots to an OKD image. I'm frankly surprised more things didn't break since this really shouldn't happen. :D

openshift-bot commented 9 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 8 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

travier commented 7 months ago

/remove-lifecycle rotten /lifecycle frozen