openshift / os

89 stars 107 forks source link

RHCOS 4.7/4.8 Failing console-login-helper-messages-issuegen.path #906

Closed jschintag closed 2 years ago

jschintag commented 2 years ago
+ cosa kola run --basic-qemu-scenarios
������  Skipping kola test pattern "fips.enable*":
������  https://bugzilla.redhat.com/show_bug.cgi?id=1782026
������  Skipping kola test pattern "crio.base":
������  https://github.com/kubernetes/kubernetes/issues/87325
������  Skipping kola test pattern "ext.config.var-mount":
������  https://github.com/ibm-s390-tools/s390-tools/pull/82
������  Skipping kola test pattern "coreos.ignition.journald-log":
������  https://github.com/coreos/coreos-assembler/issues/1173
kola --denylist-test fips.enable* --denylist-test crio.base --denylist-test ext.config.var-mount --denylist-test coreos.ignition.journald-log -p qemu-unpriv --output-dir tmp/kola run basic
=== RUN   basic
--- FAIL: basic (27.99s)
        harness.go:976: Cluster failed starting machines: machine "b92d9797-5207-4b77-a860-abac591b90f4" failed basic checks: some systemd units failed:
console-login-helper-messages-issuegen.path loaded failed failed Monitor console-login-helper-messages runtime issue snippets directory for changes
FAIL, output in tmp/kola
Error: harness: test suite failed
2022-07-21T08:12:43Z cli: harness: test suite failed
Traceback (most recent call last):
  File "/usr/lib/coreos-assembler/cmd-kola", line 89, in <module>
    subprocess.check_call(subargs)
  File "/usr/lib64/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['kola', '--denylist-test', 'fips.enable*', '--denylist-test', 'crio.base', '--denylist-test', 'ext.config.var-mount', '--denylist-test', 'coreos.ignition.journald-log', '-p', 'qemu-unpriv', '--output-dir', 'tmp/kola', 'run', 'basic']' returned non-zero exit status 1.

This occurs at the moment in both x86 and s390x, for both RHCOS 4.7 and 4.8. Maybe the systemd update in RHEL 8.4 is responsible?

https://jenkins-rhcos.cloud.s390x.psi.redhat.com/job/rhcos/job/rhcos-rhcos-4.8/169

meta.json ``` { "ostree-n-metadata-total": 9781, "ostree-n-metadata-written": 3179, "ostree-n-content-total": 6326, "ostree-n-content-written": 1314, "ostree-n-cache-hits": 19192, "ostree-content-bytes-written": 170628018, "ostree-commit": "fb120b3c372f85c9a04cfe7e1ba786d411b755b5a61d43848aa0dc9d1649f28f", "ostree-content-checksum": "98e47cebd63fcacecf00150b8a70c4660215b0cda4ef7deed885897d86f225e0", "ostree-version": "48.84.202207210805-0", "ostree-timestamp": "2022-07-21T08:11:18Z", "rpm-ostree-inputhash": "245d456c932e83012e8e6262b43bb76177c1a9cd051cec32f1039cbf6616b6f4", "buildid": "48.84.202207210805-0", "coreos-assembler.image-genver": 0, "name": "rhcos", "summary": "OpenShift 4", "coreos-assembler.build-timestamp": "2022-07-21T08:11:35Z", "coreos-assembler.image-config-checksum": "d4a98822a5a286199242ae6e8210aa6d43e35f03d8590bfe9c7d6a22f1c7dd09", "coreos-assembler.image-input-checksum": "1dcdf8304b6525f0485d2439135bda4a26b68858df7dd2fc7d4c1307237f3e7a", "coreos-assembler.code-source": "container", "coreos-assembler.container-config-git": { "commit": "84cb10fd21d0967227d3dff6de28ceff0d79d3ba", "origin": "https://gitlab.cee.redhat.com/coreos/redhat-coreos.git", "branch": "HEAD", "dirty": "false" }, "coreos-assembler.meta-stamp": 1658391133019284502, "coreos-assembler.delayed-meta-merge": false, "pkgdiff": [ [ "NetworkManager", 2, { "PreviousPackage": [ "NetworkManager", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "NetworkManager-cloud-setup", 2, { "PreviousPackage": [ "NetworkManager-cloud-setup", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager-cloud-setup", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "NetworkManager-libnm", 2, { "PreviousPackage": [ "NetworkManager-libnm", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager-libnm", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "NetworkManager-ovs", 2, { "PreviousPackage": [ "NetworkManager-ovs", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager-ovs", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "NetworkManager-team", 2, { "PreviousPackage": [ "NetworkManager-team", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager-team", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "NetworkManager-tui", 2, { "PreviousPackage": [ "NetworkManager-tui", "1:1.30.0-14.el8_4", "s390x" ], "NewPackage": [ "NetworkManager-tui", "1:1.30.0-15.el8_4", "s390x" ] } ], [ "bash", 2, { "PreviousPackage": [ "bash", "4.4.20-1.el8_4", "s390x" ], "NewPackage": [ "bash", "4.4.20-2.el8_4", "s390x" ] } ], [ "containernetworking-plugins", 2, { "PreviousPackage": [ "containernetworking-plugins", "0.9.1-1.module+el8.4.0+14872+9efa52a3", "s390x" ], "NewPackage": [ "containernetworking-plugins", "0.9.1-1.module+el8.4.0+14908+81312c48", "s390x" ] } ], [ "containers-common", 2, { "PreviousPackage": [ "containers-common", "1:1.3.1-5.module+el8.4.0+11990+22932769", "s390x" ], "NewPackage": [ "containers-common", "1:1.3.1-7.module+el8.4.0+15741+47bb6bfe", "s390x" ] } ], [ "criu", 2, { "PreviousPackage": [ "criu", "3.15-1.module+el8.4.0+14872+9efa52a3", "s390x" ], "NewPackage": [ "criu", "3.15-1.module+el8.4.0+14908+81312c48", "s390x" ] } ], [ "fuse-overlayfs", 2, { "PreviousPackage": [ "fuse-overlayfs", "1.6-1.module+el8.4.0+11822+6cc1e7d7", "s390x" ], "NewPackage": [ "fuse-overlayfs", "1.6-1.module+el8.4.0+14908+81312c48", "s390x" ] } ], [ "kernel", 2, { "PreviousPackage": [ "kernel", "4.18.0-305.49.1.el8_4", "s390x" ], "NewPackage": [ "kernel", "4.18.0-305.57.1.el8_4", "s390x" ] } ], [ "kernel-core", 2, { "PreviousPackage": [ "kernel-core", "4.18.0-305.49.1.el8_4", "s390x" ], "NewPackage": [ "kernel-core", "4.18.0-305.57.1.el8_4", "s390x" ] } ], [ "kernel-modules", 2, { "PreviousPackage": [ "kernel-modules", "4.18.0-305.49.1.el8_4", "s390x" ], "NewPackage": [ "kernel-modules", "4.18.0-305.57.1.el8_4", "s390x" ] } ], [ "kernel-modules-extra", 2, { "PreviousPackage": [ "kernel-modules-extra", "4.18.0-305.49.1.el8_4", "s390x" ], "NewPackage": [ "kernel-modules-extra", "4.18.0-305.57.1.el8_4", "s390x" ] } ], [ "libslirp", 2, { "PreviousPackage": [ "libslirp", "4.3.1-1.module+el8.4.0+14872+9efa52a3", "s390x" ], "NewPackage": [ "libslirp", "4.3.1-1.module+el8.4.0+14908+81312c48", "s390x" ] } ], [ "ostree", 2, { "PreviousPackage": [ "ostree", "2020.7-6.el8_4", "s390x" ], "NewPackage": [ "ostree", "2020.7-7.el8_4", "s390x" ] } ], [ "ostree-libs", 2, { "PreviousPackage": [ "ostree-libs", "2020.7-6.el8_4", "s390x" ], "NewPackage": [ "ostree-libs", "2020.7-7.el8_4", "s390x" ] } ], [ "platform-python", 2, { "PreviousPackage": [ "platform-python", "3.6.8-39.el8_4", "s390x" ], "NewPackage": [ "platform-python", "3.6.8-39.el8_4.1", "s390x" ] } ], [ "python3-libs", 2, { "PreviousPackage": [ "python3-libs", "3.6.8-39.el8_4", "s390x" ], "NewPackage": [ "python3-libs", "3.6.8-39.el8_4.1", "s390x" ] } ], [ "qemu-guest-agent", 2, { "PreviousPackage": [ "qemu-guest-agent", "15:4.2.0-49.module+el8.4.0+15174+49839dd8.6", "s390x" ], "NewPackage": [ "qemu-guest-agent", "15:4.2.0-49.module+el8.4.0+15731+d238d31c.7", "s390x" ] } ], [ "skopeo", 2, { "PreviousPackage": [ "skopeo", "1:1.3.1-5.module+el8.4.0+11990+22932769", "s390x" ], "NewPackage": [ "skopeo", "1:1.3.1-7.module+el8.4.0+15741+47bb6bfe", "s390x" ] } ], [ "slirp4netns", 2, { "PreviousPackage": [ "slirp4netns", "1.1.8-1.module+el8.4.0+14872+9efa52a3", "s390x" ], "NewPackage": [ "slirp4netns", "1.1.8-1.module+el8.4.0+14908+81312c48", "s390x" ] } ], [ "stalld", 2, { "PreviousPackage": [ "stalld", "1.10-1.el8_4", "s390x" ], "NewPackage": [ "stalld", "1.15-2.el8_4", "s390x" ] } ], [ "systemd", 2, { "PreviousPackage": [ "systemd", "239-45.el8_4.10", "s390x" ], "NewPackage": [ "systemd", "239-45.el8_4.11", "s390x" ] } ], [ "systemd-journal-remote", 2, { "PreviousPackage": [ "systemd-journal-remote", "239-45.el8_4.10", "s390x" ], "NewPackage": [ "systemd-journal-remote", "239-45.el8_4.11", "s390x" ] } ], [ "systemd-libs", 2, { "PreviousPackage": [ "systemd-libs", "239-45.el8_4.10", "s390x" ], "NewPackage": [ "systemd-libs", "239-45.el8_4.11", "s390x" ] } ], [ "systemd-pam", 2, { "PreviousPackage": [ "systemd-pam", "239-45.el8_4.10", "s390x" ], "NewPackage": [ "systemd-pam", "239-45.el8_4.11", "s390x" ] } ], [ "systemd-udev", 2, { "PreviousPackage": [ "systemd-udev", "239-45.el8_4.10", "s390x" ], "NewPackage": [ "systemd-udev", "239-45.el8_4.11", "s390x" ] } ] ], "advisories-diff": [ [ "RHSA-2022:5626", 1, 3, [ "kernel-4.18.0-305.57.1.el8_4.s390x", "kernel-modules-4.18.0-305.57.1.el8_4.s390x", "kernel-modules-extra-4.18.0-305.57.1.el8_4.s390x", "kernel-core-4.18.0-305.57.1.el8_4.s390x" ], { "cve_references": [ [ "https://bugzilla.redhat.com/show_bug.cgi?id=1903244", "CVE-2020-29368 kernel: the copy-on-write implementation can grant unintended write access because of a race condition in a THP mapcount check" ], [ "https://bugzilla.redhat.com/show_bug.cgi?id=2035652", "CVE-2021-4197 kernel: cgroup: Use open-time creds and namespace for migration perm checks" ], [ "https://bugzilla.redhat.com/show_bug.cgi?id=2036934", "CVE-2021-4203 kernel: Race condition in races in sk_peer_pid and sk_peer_cred accesses" ], [ "https://bugzilla.redhat.com/show_bug.cgi?id=2064604", "CVE-2022-1012 kernel: Small table perturb size in the TCP source port generation algorithm can lead to information leak" ], [ "https://bugzilla.redhat.com/show_bug.cgi?id=2086753", "CVE-2022-1729 kernel: race condition in perf_event_open leads to privilege escalation" ], [ "https://bugzilla.redhat.com/show_bug.cgi?id=2092427", "CVE-2022-32250 kernel: a use-after-free write in the netfilter subsystem can lead to privilege escalation to root" ] ] } ], [ "RHSA-2022:5622", 1, 3, [ "containers-common-1:1.3.1-7.module+el8.4.0+15741+47bb6bfe.s390x", "slirp4netns-1.1.8-1.module+el8.4.0+14908+81312c48.s390x", "containernetworking-plugins-0.9.1-1.module+el8.4.0+14908+81312c48.s390x", "libslirp-4.3.1-1.module+el8.4.0+14908+81312c48.s390x", "skopeo-1:1.3.1-7.module+el8.4.0+15741+47bb6bfe.s390x", "fuse-overlayfs-1.6-1.module+el8.4.0+14908+81312c48.s390x", "criu-3.15-1.module+el8.4.0+14908+81312c48.s390x" ], { "cve_references": [ [ "https://bugzilla.redhat.com/show_bug.cgi?id=2070368", "CVE-2022-1227 psgo: Privilege escalation in 'podman top'" ] ] } ] ], "images": { "ostree": { "path": "rhcos-48.84.202207210805-0-ostree.s390x.tar", "sha256": "ca453da37bb47c01e0f7308d4d25625ab322d9e9b691e205e4be8f77b7d84fe1", "size": 838021120 }, "qemu": { "path": "rhcos-48.84.202207210805-0-qemu.s390x.qcow2.xz", "sha256": "38647767aa529064ad93668d7ffe9d003d219ee1311baa35962f7347470bf8f7", "size": 564590240, "uncompressed-sha256": "6300c96b0def53291f960ff9159d5a2d2528f063f16e9e7f8dab8589fdd66c81", "uncompressed-size": 2338914304 } }, "coreos-assembler.container-image-git": { "commit": "b06f416661cb0bb4af9e9e92ac4b8c799bc79e46", "origin": "https://github.com/coreos/coreos-assembler", "branch": "rhcos-4.8", "dirty": "true" }, "coreos-assembler.config-gitrev": "v3.1-1370-g84cb10fd21d0967227d3dff6de28ceff0d79d3ba", "coreos-assembler.config-dirty": "false", "coreos-assembler.basearch": "s390x", "build-url": "https://jenkins-rhcos.cloud.s390x.psi.redhat.com/job/rhcos/job/rhcos-rhcos-4.8/169/" } ```

kola_artifacts.zip

jschintag commented 2 years ago

Here is the details from journal.txt:

Jul 20 15:35:54.709717 systemd[1]: Started Monitor console-login-helper-messages runtime issue snippets directory for changes.
Jul 20 15:35:54.709770 systemd[1]: Reached target Paths.
Jul 20 15:35:54.710896 systemd[1]: Listening on D-Bus System Message Bus Socket.
Jul 20 15:35:54.710937 systemd[1]: Reached target Sockets.
Jul 20 15:35:54.710976 systemd[1]: Reached target Basic System.
Jul 20 15:35:54.711536 systemd[1]: Started D-Bus System Message Bus.
Jul 20 15:35:54.713049 systemd[1]: Starting Network Manager...
Jul 20 15:35:54.720400 systemd[1]: Starting Mark boot complete...
Jul 20 15:35:54.721460 systemd[1]: Starting OpenSSH ed25519 Server Key Generation...
Jul 20 15:35:54.722939 systemd[1]: Started Daily rotation of log files.
Jul 20 15:35:54.723447 systemd[1]: Starting OpenSSH ecdsa Server Key Generation...
Jul 20 15:35:54.724099 systemd[1]: Starting NTP client/server...
Jul 20 15:35:54.724531 systemd[1]: Starting CoreOS Generate iSCSI Initiator Name...
Jul 20 15:35:54.724635 systemd[1]: console-login-helper-messages-issuegen.path: Failed with result 'unit-condition-failed'.
mike-nguyen commented 2 years ago

This is occurring on x86 also. Looks like systemd is the culprit. I rolled back to the previous version and the clhm systemd unit failure goes away. I will investigate using a newer version of clhm.

ashcrow commented 2 years ago

Related: https://bugzilla.redhat.com/show_bug.cgi?id=2109546

travier commented 2 years ago

Be careful as 0.21.1 and later require a version of util-linux that is not in RHEL 8.6 thus we have to use older versions or use patches only.

mike-nguyen commented 2 years ago

TL;DR: systemd change is breaking this. It looks like offending commit in systemd was reverted sometime down the line but there is no context as to why. In the meantime, systemd will be pinned to a working version until we can find a fix.

cgwalters commented 2 years ago

Note that c-l-h-m is a currently CoreOS specific thing. A good goal would be to make use of the fact that it's now in RHEL9 to have it perhaps be used by traditional RHEL server too.

That would make it much more likely for e.g. selinux policy changes (which we've tripped on here several times) to be caught before bugs make their way to us.

cgwalters commented 2 years ago

It looks like this was introduced by https://bugzilla.redhat.com/show_bug.cgi?id=2065322

So...we're definitely in an ugly situation here because we have to either:

  1. Backport the c-l-h-m fix to many supported releases
  2. Revert systemd (too ugly, just listing for completeness)
  3. Ignore the test failure (and really, subject all of our customers to warnings about systemd unit failing)
  4. Temporarily disable c-l-h-m

Of all of these, I lean a bit towards 4. but could be convinced to do 1. if we can agree to do it quickly.

travier commented 2 years ago

I think we should do 1. for all of our currently fully supported releases (4.10+) and maybe 4 for all the others.

mike-nguyen commented 2 years ago

This is happening across all versions now. RHEL 8.6 EUS just picked up the same systemd patch that caused this for RHEL 8.4 EUS

mike-nguyen commented 2 years ago

https://github.com/coreos/console-login-helper-messages/pull/112 for paper trail

cgwalters commented 2 years ago

Confirmed we have green x86_64 builds for 4.7 and 4.8.