tigera / operator

Kubernetes operator for installing Calico and Calico Enterprise
Apache License 2.0
181 stars 138 forks source link

SIGSEGV: segmentation violation PC=0x7f820e813ead m=0 sigcode=1 signal arrived during cgo execution #3461

Open exfly opened 1 month ago

exfly commented 1 month ago

Expected Behavior

docker run --rm -it --network=host --entrypoint='' -v /lib64/libpthread-2.28.so:/lib64/libpthread.so.0 -v /lib64/libc-2.28.so:/lib64/libc.so.6 -v /lib64/ld-2.28.so:/lib64/ld-linux-x86-64.so.2 quay.io/tigera/operator:v1.32.10 /usr/local/bin/operator no SIGSEGV: segmentation violation

docker run --rm -it --network=host --entrypoint='' quay.io/tigera/operator:v1.32.10 /usr/local/bin/operator no SIGSEGV: segmentation violation

Current Behavior

docker run --rm -it --network=host --entrypoint='' -v /lib64/libpthread-2.28.so:/lib64/libpthread.so.0 -v /lib64/libc-2.28.so:/lib64/libc.so.6 -v /lib64/ld-2.28.so:/lib64/ld-linux-x86-64.so.2 quay.io/tigera/operator:v1.32.10 /usr/local/bin/operator no SIGSEGV: segmentation violation

docker run --rm -it --network=host --entrypoint='' quay.io/tigera/operator:v1.32.10 /usr/local/bin/operator SIGSEGV: segmentation violation:

SIGSEGV: segmentation violation
PC=0x7f64fe436ead m=0 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x401480, 0xc00015bdf0)
    /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00015bdc8 sp=0xc00015bd90 pc=0x5825cb
crypto/internal/boring._Cfunc__goboringcrypto_BORINGSSL_bcm_power_on_self_test()
    _cgo_gotypes.go:424 +0x3f fp=0xc00015bdf0 sp=0xc00015bdc8 pc=0x7e343f
crypto/internal/boring.init.0()
    /usr/local/go/src/crypto/internal/boring/boring.go:26 +0x13 fp=0xc00015be10 sp=0xc00015bdf0 pc=0x7e8c13
runtime.doInit1(0x3e40bf0)
    /usr/local/go/src/runtime/proc.go:6735 +0xd8 fp=0xc00015bf40 sp=0xc00015be10 pc=0x5c65b8
runtime.doInit(...)
    /usr/local/go/src/runtime/proc.go:6702
runtime.main()
    /usr/local/go/src/runtime/proc.go:249 +0x374 fp=0xc00015bfe0 sp=0xc00015bf40 pc=0x5b9414
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00015bfe8 sp=0xc00015bfe0 pc=0x5ea521

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000146fa8 sp=0xc000146f88 pc=0x5b97ce
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
    /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000146fe0 sp=0xc000146fa8 pc=0x5b9633
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000146fe8 sp=0xc000146fe0 pc=0x5ea521
created by runtime.init.6 in goroutine 1
    /usr/local/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000147778 sp=0xc000147758 pc=0x5b97ce
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
    /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0001477c8 sp=0xc000147778 pc=0x5a3634
runtime.gcenable.func1()
    /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0001477e0 sp=0xc0001477c8 pc=0x5987e5
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001477e8 sp=0xc0001477e0 pc=0x5ea521
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc000170000?, 0x2ca5c60?, 0x1?, 0x0?, 0xc0000071e0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000147f70 sp=0xc000147f50 pc=0x5b97ce
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x3f85bc0)
    /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000147fa0 sp=0xc000147f70 pc=0x5a0f09
runtime.bgscavenge(0x0?)
    /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000147fc8 sp=0xc000147fa0 pc=0x5a149c
runtime.gcenable.func2()
    /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc000147fe0 sp=0xc000147fc8 pc=0x598785
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000147fe8 sp=0xc000147fe0 pc=0x5ea521
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait]:
runtime.gopark(0x198?, 0x2433600?, 0x1?, 0xa9?, 0x0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000146620 sp=0xc000146600 pc=0x5b97ce
runtime.runfinq()
    /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0001467e0 sp=0xc000146620 pc=0x597787
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001467e8 sp=0xc0001467e0 pc=0x5ea521
created by runtime.createfing in goroutine 1
    /usr/local/go/src/runtime/mfinal.go:163 +0x3d

goroutine 6 [sleep]:
runtime.gopark(0x37b629b19607?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000148758 sp=0xc000148738 pc=0x5b97ce
time.Sleep(0x6fc23ac00)
    /usr/local/go/src/runtime/time.go:195 +0x125 fp=0xc000148798 sp=0xc000148758 pc=0x5e74e5
sigs.k8s.io/controller-runtime/pkg/log.init.0.func1()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/log/log.go:63 +0x2d fp=0xc0001487e0 sp=0xc000148798 pc=0x14a33ad
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001487e8 sp=0xc0001487e0 pc=0x5ea521
created by sigs.k8s.io/controller-runtime/pkg/log.init.0 in goroutine 1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/log/log.go:62 +0x1a

rax    0x4fcea08
rbx    0x7ffc0192e6a8
rcx    0x40
rdx    0xa0
rdi    0x4feea40
rsi    0x4fee860
rbp    0xc00015bd80
rsp    0x7ffc0192e518
r8     0xffffffffffffffc8
r9     0x1e0
r10    0xfffffffffffffff8
r11    0x0
r12    0x4fcea08
r13    0x1a
r14    0x7ffc0192e688
r15    0x0
rip    0x7f64fe436ead
rflags 0x10282
cs     0x33
fs     0x0
gs     0x0

Possible Solution

None

Steps to Reproduce (for bugs)

install calico on kylinv10

Context

this problem maybe related cgo. umb:8.9 incompatible with kylinv10

Your Environment

exfly commented 1 month ago

Disable CGO works. Can we disable CGO/boringcrypto by default for compatible, or Is it possible to provide two images, one with cgo disabled and one as it is now?

tmjd commented 1 month ago

Does this only happen when using the volume mounts? -v /lib64/libpthread-2.28.so:/lib64/libpthread.so.0 -v /lib64/libc-2.28.so:/lib64/libc.so.6 -v /lib64/ld-2.28.so:/lib64/ld-linux-x86-64.so.2

If yes, can you explain why this is reasonable and should be supported?

exfly commented 3 weeks ago

Does this only happen when using the volume mounts? -v /lib64/libpthread-2.28.so:/lib64/libpthread.so.0 -v /lib64/libc-2.28.so:/lib64/libc.so.6 -v /lib64/ld-2.28.so:/lib64/ld-linux-x86-64.so.2

If yes, can you explain why this is reasonable and should be supported?

#!/usr/bin/awk -f

BEGIN {
    while (!/flags/) if (getline < "/proc/cpuinfo" != 1) exit 1
    if (/lm/&&/cmov/&&/cx8/&&/fpu/&&/fxsr/&&/mmx/&&/syscall/&&/sse2/) level = 1
    if (level == 1 && /cx16/&&/lahf/&&/popcnt/&&/sse4_1/&&/sse4_2/&&/ssse3/) level = 2
    if (level == 2 && /avx/&&/avx2/&&/bmi1/&&/bmi2/&&/f16c/&&/fma/&&/abm/&&/movbe/&&/xsave/) level = 3
    if (level == 3 && /avx512f/&&/avx512bw/&&/avx512cd/&&/avx512dq/&&/avx512vl/) level = 4
    if (level > 0) { print "CPU supports x86-64-v" level; exit level + 1 }
    exit 1
}
  1. If yes: yes
  2. reasonable: Run this scripts, print CPU supports x86-64-v3. I think base img registry.access.redhat.com/ubi8/ubi-minimal:latest and alamalinux:8.9 use some new cpu instructions x86-64-v3 not supported
  3. tmp workaround, use calico v3.25.0 and not use operator. or patch with this dockerfile
FROM rockylinux:8 as ubi
FROM quay.io/tigera/operator:v1.32.10

COPY --from=ubi /lib64/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2
COPY --from=ubi /lib64/libpthread.so.0 /lib64/libpthread.so.0
COPY --from=ubi /lib64/libc.so.6 /lib64/libc.so.6
exfly commented 3 weeks ago

I think use rockylinux:8 as base img, which should be supported

pandada8 commented 1 week ago

Similar issue: https://github.com/ceph/ceph-csi/issues/4379

Did some more testing and it looks like glibc > 2.28-241.el8 will SIGSEGV on Hygon CPU (you can't even run bash). The problem appears to be introduced by Redhat's backport, as it appears that archlinux's glibc appears to be running perfectly fine.

Unfortunately CentOS Stream 8 is already EOL, I am not sure where sould I report this issue though.