Closed zrav closed 1 year ago
Ugh, it looks like apport not handling crashes of processes from non-host mount namespaces.
I can suggest you tweak core_pattern temporarily until the next crash of lxcfs. Like this:
echo '|/bin/sh -c $@ -- eval exec cat > /var/crash/core-%e.%p' > /proc/sys/kernel/core_pattern
(reason for using piping here is that kernel will ignore the RLIMIT_CORE
value)
Then you can restore the original value of /proc/sys/kernel/core_pattern
after we collect the coredump.
@zrav do you have any updates?
lxcfs hasn't crashed again so far. As soon as it happens I'll post the info.
From the disassembly analysis of Glibc libc-2.31.so
it follows that this crash was here:
000000000008be50 <vscanf@@GLIBC_2.2.5>:
8be50: f3 0f 1e fa endbr64
8be54: 48 8b 05 65 01 16 00 mov 0x160165(%rip),%rax # 1ebfc0 <stdin@@GLIBC_2.2.5-0x
17d0>
...
8bf59: 4c 89 ef mov %r13,%rdi
8bf5c: 4c 89 4c 24 08 mov %r9,0x8(%rsp)
8bf61: e8 3a 68 00 00 call 927a0 <_IO_enable_locks@@GLIBC_PRIVATE+0xb0>
8bf66: 48 89 e9 mov %rbp,%rcx
8bf69: 4c 89 e2 mov %r12,%rdx
8bf6c: 48 89 ee mov %rbp,%rsi
8bf6f: 48 8d 05 2a d2 15 00 lea 0x15d22a(%rip),%rax # 1e91a0 <_IO_wfile_jumps@@GLIBC_2.2.5+0x240>
8bf76: 4c 89 ef mov %r13,%rdi
8bf79: 48 89 84 24 e8 00 00 mov %rax,0xe8(%rsp)
8bf80: 00
8bf81: c6 45 00 00 movb $0x0,0x0(%rbp) <=== CRASH
8bf85: e8 06 7e 00 00 call 93d90 <_IO_str_pbackfail@@GLIBC_2.2.5+0x60>
8bf8a: 89 d9 mov %ebx,%ecx
8bf8c: 4c 89 fa mov %r15,%rdx
8bf8f: 4c 89 f6 mov %r14,%rsi
8bf92: 4c 89 ef mov %r13,%rdi
8bf95: e8 c6 a8 fe ff call 76860 <psiginfo@@GLIBC_2.10+0x13400>
8bf9a: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
8bf9f: 4c 39 4c 24 48 cmp %r9,0x48(%rsp)
8bfa4: 74 08 je 8bfae <vscanf@@GLIBC_2.2.5+0x15e>
8bfa6: 48 8b 54 24 38 mov 0x38(%rsp),%rdx
8bfab: c6 02 00 movb $0x0,(%rdx)
8bfae: 48 8b 9c 24 48 01 00 mov 0x148(%rsp),%rbx
8bfb5: 00
8bfb6: 64 48 33 1c 25 28 00 xor %fs:0x28,%rbx
..
Another crash https://discuss.linuxcontainers.org/t/lxd-5-9-crashes-on-centos-7/16092
@zrav do you have any updated regarding this, or we can close the issue until next reproducer with more debug information?
We have integrated libsegfault (https://github.com/lxc/lxd-pkg-snap/pull/114) to the LXD snap package, so it should help us to find crash reason next time.
@mihalicyn I'll close this issue. If/when I have more info I'll post it.
Due to https://discuss.linuxcontainers.org/t/number-of-cpus-reported-by-proc-stat-fluctuates-causing-issues/15780 we are running LXD 5.9 revision 24164. After running a few days lxcfs crashed:
This is an Ubuntu 22.04.1 running kernel 5.15.0-56-generic on an AMD Epyc 7702P (128 thread) system with 512GB RAM.
As requested, further information:
Unfortunately no dumps are available and the lxd log shows nothing of interest during the time of crash:
Please tell me if I should modify any configuration to catch the next possible crash.