openshift / os

89 stars 107 forks source link

`ext.config.systemd.journal-compat` failing on SCOS in Prow #1505

Closed jlebon closed 4 months ago

jlebon commented 4 months ago
--- FAIL: ext.config.systemd.journal-compat (58.22s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 125
        cluster.go:162: 2024-05-08T15:19:40Z cli: Unit kola-runext.service exited with code 125
        harness.go:1263: kolet failed: : kolet run-test-unit failed: Process exited with status 1

Journal:

May  8 15:19:37.232425 init.scope[1]: Started kola-runext.service.
...
May  8 15:19:37.892038 kola-runext.service[2106]: 2024-05-08 15:19:37.889648267 +0000 UTC m=+0.519405285 system refresh
...
May  8 15:19:39.529980 init.scope[1]: kola-runext.service: Main process exited, code=exited, status=125/n/a
May  8 15:19:39.530128 init.scope[1]: kola-runext.service: Failed with result 'exit-code'.

Not very clear what's going on.

journal.txt console.txt

jlebon commented 4 months ago

OK, so podman run fails but because we capture stdout and stderr and don't output it on failure, it gets swallowed. Here's the error it actually hit:

Error: copying system image from manifest list: Source image rejected: None of the signatures were accepted, reasons: open /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release: no such file or directory

containers-common normally ships those files (which is an issue in itself: https://bugzilla.redhat.com/show_bug.cgi?id=2182197). But it doesn't ship them of course in RHEL: https://gitlab.com/redhat/centos-stream/rpms/containers-common/-/blob/5ebc0aa1895f562b7647f6a7acabd8805259dcaa/containers-common.spec#L154.

And the reason CI on c9s is hitting this is that we're pulling the containers-common from RHEL:

Installing 544 packages:
  ...
  containers-common-3:1-75.rhaos4.16.el9.x86_64 (rhel-9.4-server-ose-4.16)

And the reason for that is that for some reason those packages have epoch 3 while c9s has epoch 2: https://gitlab.com/redhat/centos-stream/rpms/containers-common/-/blob/5ebc0aa1895f562b7647f6a7acabd8805259dcaa/containers-common.spec#L12

So, we should reach out to the maintainers to understand what's going on there. For now, we can force containers-common to come from c9s-appstream. Long-term, this will be fixed by #799 because then we can get better delimit the hack of adding OCP repos to the SCOS build to just the OCP layer (but ideally, we can get rid of that too once all the packages are in CentOS proper).

aaradhak commented 4 months ago

I recently came across this discussion on forum-ocp-art regarding the epoch number discrepancy. This was with reference to the containers-common pkg downgrade issue that occurred recently.

┌─────────────────────────────┬──────────────────────────────────────┬───────┐ │ tag │ build │ epoch │ ├─────────────────────────────┼──────────────────────────────────────┼───────┤ │ rhaos-4.12-rhel-8-candidate │ containers-common-1-36.rhaos4.12.el8 │ 2 │ │ rhaos-4.13-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │ │ rhaos-4.14-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │ │ rhaos-4.15-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │ │ rhaos-4.16-rhel-9-candidate │ containers-common-1-63.rhaos4.16.el9 │ 2 │ │ rhaos-4.17-rhel-9-candidate │ containers-common-1-37.rhaos4.13.el9 │ 3 │ └─────────────────────────────┴──────────────────────────────────────┴───────┘