openshift / os

90 stars 107 forks source link

[4.14-9.2][x86_64] : crio.base kola test fails with cri-o pkg upgrade #1542

Closed aaradhak closed 4 months ago

aaradhak commented 4 months ago

In the recent [4.14-9.2][x86_64] build-414.92.202407021036-0, the crio.base test fails with StatusCode:403 error message. The error is an http response of 403 when trying to pull registry.k8s.io/pause:3.9

This is found to occur after the cri-o update cri-o 1.27.7-6.rhaos4.14.gited4b2c6.el9 -> 1.27.8-2.rhaos4.14.gitbfac241.el9

[2024-07-02T11:02:50.795Z] --- FAIL: crio.base (28.85s)
[2024-07-02T11:02:50.795Z]     --- PASS: crio.base/crio-info (0.12s)
[2024-07-02T11:02:50.795Z]     --- FAIL: crio.base/pod-continues-during-service-restart (2.09s)
[2024-07-02T11:02:50.795Z]             cluster.go:162: E0702 11:02:44.839263    2072 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = creating pod sandbox with name \"k8s_rhcos-crio-pod-restart-test_redhat.test.crio__0\": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset=\"utf-8\"><meta name=vi..."
[2024-07-02T11:02:50.795Z]             cluster.go:162: time="2024-07-02T11:02:44Z" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name \"k8s_rhcos-crio-pod-restart-test_redhat.test.crio__0\": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset=\"utf-8\"><meta name=vi..."
[2024-07-02T11:02:50.795Z]             cluster.go:184: "sudo crictl runp -T 300s restart-testPod2407944887" failed: output , status Process exited with status 1
[2024-07-02T11:02:50.795Z]     --- FAIL: crio.base/networks-reliably (1.92s)
[2024-07-02T11:02:50.795Z]             cluster.go:162: E0702 11:02:48.195978    2329 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = creating pod sandbox with name \"k8s_rhcos-crio-pod-ping1_redhat.test.crio__0\": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset=\"utf-8\"><meta name=vi..."
[2024-07-02T11:02:50.796Z]             cluster.go:162: time="2024-07-02T11:02:48Z" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name \"k8s_rhcos-crio-pod-ping1_redhat.test.crio__0\": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset=\"utf-8\"><meta name=vi..."
[2024-07-02T11:02:50.796Z]             cluster.go:184: "sudo crictl runp -T 300s ping1Pod1411168393" failed: output , status Process exited with status 1
[2024-07-02T11:02:50.796Z] FAIL, output in /home/jenkins/agent/workspace/build/tmp/kola-OmkJR/kola/rerun
[2024-07-02T11:02:50.796Z] Error: harness: test suite failed
[2024-07-02T11:02:50.796Z] 2024-07-02T11:02:50Z cli: harness: test suite failed
[2024-07-02T11:02:50.796Z] failed to execute cmd-kola: exit status 1
jlebon commented 4 months ago

This looks like an infra flake/pull limit, but testing locally, it reliably happens with the newer cri-o and reliably not with older cri-o.

And yet, the actual cri-o diff is inconsequential: https://github.com/cri-o/cri-o/compare/ed4b2c6...bfac241

And the buildroot seems identical:

$ curl -Lo buildroot.old.log https://.../cri-o/1.27.7/6.rhaos4.14.gited4b2c6.el9/data/logs/x86_64/installed_pkgs.log
$ curl -Lo buildroot.new.log https://.../cri-o/1.27.8/2.rhaos4.14.gitbfac241.el9/data/logs/x86_64/installed_pkgs.log
$ diff buildroot.*.log
$

So... possibly some undefined behaviour somewhere?

jlebon commented 4 months ago

To reproduce this, download the latest 4.14 RHCOS qcow2 and boot it with cosa. Once inside the VM:

[core@cosa-devsh ~]$ sudo -i
[root@cosa-devsh ~]# cat > pod.json
{
        "metadata": {
                "name": "rhcos-crio-pod-restart-test",
                "namespace": "redhat.test.crio",
                "uid": "b333852e-15ad-4c55-b9e4-23c681b20c4d"
        },
        "image": {
                        "image": "busybox"
        },
        "args": [],
        "readonly_rootfs": false,
        "log_path": "",
        "stdin": false,
        "stdin_once": false,
        "tty": false,
        "linux": {
                        "resources": {
                                        "memory_limit_in_bytes": 209715200,
                                        "cpu_period": 10000,
                                        "cpu_quota": 20000,
                                        "cpu_shares": 512,
                                        "oom_score_adj": 30,
                                        "cpuset_cpus": "0",
                                        "cpuset_mems": "0"
                        },
                        "cgroup_parent": "Burstable-pod-123.slice",
                        "security_context": {
                                        "namespace_options": {
                                                        "pid": 1
                                        },
                                        "capabilities": {
                                                        "add_capabilities": [
                                                                "sys_admin"
                                                        ]
                                        }
                        }
        }
}
[root@cosa-devsh ~]# rpm-ostree usroverlay
[root@cosa-devsh ~]# curl -kLO https://.../cri-o-1.27.8-2.rhaos4.14.gitbfac241.el9.x86_64.rpm
[root@cosa-devsh ~]# rpm -Uvh cri-o-1.27.8-2.rhaos4.14.gitbfac241.el9.x86_64.rpm
[root@cosa-devsh ~]# systemctl start crio
[root@cosa-devsh ~]# crictl runp pod.json
E0703 13:53:45.991003    2187 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = creating pod sandbox with name \"k8s_rhcos-crio-pod-restart-test_redhat.test.crio_b333852e-15ad-4c55-b9e4-23c681b20c4d_0\": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset=\"utf-8\"><meta name=vi..."
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_rhcos-crio-pod-restart-test_redhat.test.crio_b333852e-15ad-4c55-b9e4-23c681b20c4d_0": initializing source docker://registry.k8s.io/pause:3.9: pinging container registry registry.k8s.io: StatusCode: 403, <!doctype html><meta charset="utf-8"><meta name=vi...
[root@cosa-devsh ~]#
jlebon commented 4 months ago

This was narrowed down to User-Agent filtering that happens to trigger on the specific git hash of this cri-o build: https://github.com/kubernetes/registry.k8s.io/issues/286. We've untagged the build from 4.14 for now.

Closing as dupe.