openshift / os

89 stars 107 forks source link

[4.17-9.4] `crio.base` failing with "openssl: can't retrieve OpenSSL version" #1523

Closed nikita-dubrovskii closed 2 months ago

nikita-dubrovskii commented 3 months ago

Tests fail with:

    core : PWD=/var/home/core ; USER=root ; COMMAND=/bin/crictl exec 5907f463f2f37a60afb9c29a374659aab9749109144175f7f5379e481fbf4076 bash -c 'sleep 25 && echo PASS > /tmp/test/restart-test'
pam_unix(sudo:session): session opened for user root(uid=0) by core(uid=1000)
run-runc-5907f463f2f37a60afb9c29a374659aab9749109144175f7f5379e481fbf4076-runc.tA3M9r.mount: Deactivated successfully.
panic: opensslcrypto: can't initialize OpenSSL : openssl: can't retrieve OpenSSL version

goroutine 1 gp=0xc0000061c0 m=0 mp=0x55780958a1c0 [running]:
panic({0x55780919e680?, 0xc00002ee10?})
        /usr/lib/golang/src/runtime/panic.go:779 +0x158 fp=0xc0000f3ce8 sp=0xc0000f3c38 pc=0x557808cdaa78
crypto/internal/backend.init.0()
        /usr/lib/golang/src/crypto/internal/backend/openssl.go:50 +0x26c fp=0xc0000f3e20 sp=0xc0000f3ce8 pc=0x557808ee7a4c
runtime.doInit1(0x5578095722f0)
        /usr/lib/golang/src/runtime/proc.go:7176 +0xea fp=0xc0000f3f50 sp=0xc0000f3e20 pc=0x557808ceca0a
runtime.doInit(...)
        /usr/lib/golang/src/runtime/proc.go:7143
runtime.main()
        /usr/lib/golang/src/runtime/proc.go:253 +0x357 fp=0xc0000f3fe0 sp=0xc0000f3f50 pc=0x557808cde157
runtime.goexit({})
        /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000f3fe8 sp=0xc0000f3fe0 pc=0x557808d11361
...

journal.txt

nikita-dubrovskii commented 3 months ago

As i can see, only those packages between successful and failed builds have different version:

openshift-clients-4.17.0-202406040213.p0.gb9859d5.assembly.stream.el9.x86_64

vs

openshift-clients-4.17.0-202406051012.p0.g0b082c6.assembly.stream.el9.x86_64
jlebon commented 3 months ago

Another diff here is:

runc 4:1.1.12-2.el9 -> 4:1.1.12-3.rhaos4.17.el9

The major difference between those builds is that the rhaos one is built in the rhaos-4.17-rhel-9-candidate build target, which has a different buildroot. Two major differences are that the el9 one has golang 1.21 and openssl-1:3.0.7-25.el9, while rhaos has golang 1.22 and openssl-3.0.7-6.el9_2 (i.e. newer golang, but older openssl).

So this is likely a buildroot mismatch.

Someone asked steps to be able to reproduce this. Download the latest 4.17-9.4 RHCOS from the RHCOS release browser, and then:

[root@cosa-devsh ~]# rpm-ostree usroverlay
[root@cosa-devsh ~]# rpm -Uvh runc-1.1.12-3.rhaos4.17.el9.x86_64.rpm
[root@cosa-devsh ~]# systemctl start crio
[root@cosa-devsh ~]# cat pod.json
{
  "metadata": {
    "name": "busybox-sandbox",
    "namespace": "default",
    "attempt": 1,
    "uid": "aewi4aeThua7ooShohbo1phoj"
  },
  "log_directory": "/tmp",
  "linux": {
  }
}
[root@cosa-devsh ~]# cat container.json
{
  "metadata": {
    "name": "busybox"
  },
  "image":{
    "image": "busybox"
  },
  "command": [
    "top"
  ],
  "log_path":"busybox.log",
  "linux": {
  }
}
[root@cosa-devsh ~]# crictl runp pod.json
d4051c522d475f592ec77b192ee7ec22d8c4416cc3d7f93b13f17fc3c24462b8
[root@cosa-devsh ~]# crictl create d4051c522d475f592ec77b192ee7ec22d8c4416cc3d7f93b13f17fc3c24462b8 container.json pod.json
b96c4cc86b77f525e39c95b0d703ca44020664ffbf8094591b824bea566b87d0
[root@cosa-devsh ~]# crictl start b96c4cc86b77f525e39c95b0d703ca44020664ffbf8094591b824bea566b87d0
b96c4cc86b77f525e39c95b0d703ca44020664ffbf8094591b824bea566b87d0
[root@cosa-devsh ~]# crictl exec b96c4cc86b77f525e39c95b0d703ca44020664ffbf8094591b824bea566b87d0 sh -c "sleep 5"
panic: opensslcrypto: can't initialize OpenSSL : openssl: can't retrieve OpenSSL version
...
jlebon commented 3 months ago

Still an issue with cri-o-1.30.2-2.rhaos4.17.gitfbfc68d.el9 and runc-4:1.1.12-5.rhaos4.17.el9:

--- FAIL: crio.base (36.57s)
    --- PASS: crio.base/crio-info (0.07s)
    --- FAIL: crio.base/pod-continues-during-service-restart (2.68s)
            cluster.go:213: "sudo crictl exec 4a8260e7d76e3bc3b5eab32e14e69398276eeaf6b74056c41891c51283508938 bash -c \"sleep 25 && echo PASS > /tmp/test/restart-test\"" failed: Process exited with status 1; logs from journalctl -t kola:
    runtime.(*scavengerState).park(0x559384c606e0)
        /usr/lib/golang/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00004dfa8 sp=0xc00004df78 pc=0x55938439e169
    runtime.bgscavenge(0xc00002a070)
        /usr/lib/golang/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00004dfc8 sp=0xc00004dfa8 pc=0x55938439e6fc
    runtime.gcenable.gowrap2()
        /usr/lib/golang/src/runtime/mgc.go:204 +0x25 fp=0xc00004dfe0 sp=0xc00004dfc8 pc=0x559384395045
    runtime.goexit({})
        /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x5593843e7ec1
    created by runtime.gcenable in goroutine 1
        /usr/lib/golang/src/runtime/mgc.go:204 +0xa5

    goroutine 5 gp=0xc000007c00 m=nil [runnable]:
    runtime.runfinq()
        /usr/lib/golang/src/runtime/mfinal.go:177 fp=0xc00004c7e0 sp=0xc00004c7d8 pc=0x559384393fe0
    runtime.goexit({})
        /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004c7e8 sp=0xc00004c7e0 pc=0x5593843e7ec1
    created by runtime.createfing in goroutine 1
        /usr/lib/golang/src/runtime/mfinal.go:164 +0x3d
    time="2024-06-18T18:35:32Z" level=error msg="exec failed: unable to start container process: read init-p: connection reset by peer"
    time="2024-06-18T18:35:32Z" level=fatal msg="execing command in container: command terminated with exit code 255"
    --- FAIL: crio.base/networks-reliably (8.34s)
            cluster.go:162: panic: opensslcrypto: can't initialize OpenSSL : openssl: can't retrieve OpenSSL version
            cluster.go:162:
            cluster.go:162: goroutine 1 gp=0xc0000061c0 m=0 mp=0x557e359d91c0 [running]:
            cluster.go:162: panic({0x557e355edb20?, 0xc00002ee10?})
            cluster.go:162:     /usr/lib/golang/src/runtime/panic.go:779 +0x158 fp=0xc0000f3ce8 sp=0xc0000f3c38 pc=0x557e35129598
            cluster.go:162: crypto/internal/backend.init.0()
            cluster.go:162:     /usr/lib/golang/src/crypto/internal/backend/openssl.go:50 +0x26c fp=0xc0000f3e20 sp=0xc0000f3ce8 pc=0x557e3533646c
            cluster.go:162: runtime.doInit1(0x557e359c12f0)
            cluster.go:162:     /usr/lib/golang/src/runtime/proc.go:7176 +0xea fp=0xc0000f3f50 sp=0xc0000f3e20 pc=0x557e3513b52a
            cluster.go:162: runtime.doInit(...)
            cluster.go:162:     /usr/lib/golang/src/runtime/proc.go:7143
            cluster.go:162: runtime.main()
            cluster.go:162:     /usr/lib/golang/src/runtime/proc.go:253 +0x357 fp=0xc0000f3fe0 sp=0xc0000f3f50 pc=0x557e3512cc77
            cluster.go:162: runtime.goexit({})
            cluster.go:162:     /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000f3fe8 sp=0xc0000f3fe0 pc=0x557e3515fec1
...

journal.txt console.txt

travier commented 3 months ago

I asked ART about this.

jlebon commented 2 months ago

This was fixed by https://github.com/golang-fips/go/pull/207.

chandramerla commented 2 months ago

It seems the PR refered above is still not merged and I'm facing the same issue (same stack trace) where calico-node readiness probe when called is failing as below:

Events: Type Reason Age From Message


Normal Scheduled 80s default-scheduler Successfully assigned kube-system/calico-node-kmhr5 to node01 Normal Pulled 81s kubelet Container image "quay.io/calico/cni:v3.27.2" already present on machine Normal Created 80s kubelet Created container upgrade-ipam Normal Started 80s kubelet Started container upgrade-ipam Normal Pulled 79s kubelet Container image "quay.io/calico/cni:v3.27.2" already present on machine Normal Created 79s kubelet Created container install-cni Normal Started 79s kubelet Started container install-cni Normal Pulled 78s kubelet Container image "quay.io/calico/node:v3.27.2" already present on machine Normal Created 78s kubelet Created container mount-bpffs Normal Started 78s kubelet Started container mount-bpffs Normal Pulled 77s kubelet Container image "quay.io/calico/node:v3.27.2" already present on machine Normal Created 77s kubelet Created container calico-node Normal Started 77s kubelet Started container calico-node Warning Unhealthy 76s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: panic: opensslcrypto: can't initialize OpenSSL : openssl: can't retrieve OpenSSL version

goroutine 1 gp=0xc0000021c0 m=0 mp=0x2aa3b0f7420 [running]: panic({0x2aa3ad4b3c0, 0xc00002ae20}) /usr/lib/golang/src/runtime/panic.go:779 +0x156 fp=0xc00018fce8 sp=0xc00018fc40 pc=0x2aa3a7bd616 crypto/internal/backend.init.0() /usr/lib/golang/src/crypto/internal/backend/openssl.go:50 +0x3ec fp=0xc00018fe28 sp=0xc00018fce8 pc=0x2aa3aa4383c runtime.doInit1(0x2aa3b0df2d0) /usr/lib/golang/src/runtime/proc.go:7176 +0xfc fp=0xc00018ff68 sp=0xc00018fe28 pc=0x2aa3a7d43cc runtime.doInit(...) /usr/lib/golang/src/runtime/proc.go:7143 runtime.main() /usr/lib/golang/src/runtime/proc.go:253 +0x3e2 fp=0xc00018ffd8 sp=0xc00018ff68 pc=0x2aa3a7c1a72 runtime.goexit({}) /usr/lib/golang/src/runtime/asm_s390x.s:774 +0x2 fp=0xc00018ffd8 sp=0xc00018ffd8 pc=0x2aa3a800582

Any workarounds that I can follow, till PR is merged? Thanks!