dmesg stack trace does not match `prog.crashed_thread().stack_trace()`

danobi commented 2 years ago

Here's what the dmesg stack trace looks like:

[76523.209096] RIP: 0010:hash_netiface4_resize+0x608/0x6a0 [ip_set_hash_netiface]
[76523.209096] Code: ff e9 6e fe ff ff 8d 4b f6 b8 01 00 00 00 d3 e0 48 8d 3c 40 89 45 c0 48 c1 e7 03 e8 b2 e8 fd ff 48 89 c1 48 8b 45 d0 48 85 c9 <48> 89 48 10 74 c5 0f b6 45 c6 48 8b 75 d0 31 d2 8b 7d c0 88 46 08
[76523.289173] RSP: 0018:ffff9cbe89c03828 EFLAGS: 00010286
[76523.289173] RAX: 0000000000000010 RBX: 0000000000000020 RCX: ffff9cbe9d7e5000
[76523.289173] RDX: 0000000000000000 RSI: ffff8e0cb95578e0 RDI: ffff8e0bb48cef28
[76523.289173] RBP: ffff9cbe89c038b8 R08: 0000000000000001 R09: ffff8e0cb95567d8
[76523.289173] R10: ffff9cbea37e4fff R11: 000ffffffffff000 R12: 0000000000000000
[76523.289173] R13: ffff8e0c169ea0a0 R14: 0000000000000020 R15: ffff8e0c4c361760
[76523.289173] FS:  00007f6c073f6b80(0000) GS:ffff8e0cbfbc0000(0000) knlGS:0000000000000000
[76523.289173] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76523.289173] CR2: 0000000000000020 CR3: 0000000e4d662002 CR4: 00000000003706e0
[76523.289173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76523.289173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76523.289173] Call Trace:
[76523.289173]  call_ad+0xab/0x260 [ip_set]
[76523.289173]  ip_set_ad.isra.26+0x172/0x2d0 [ip_set]
[76523.289173]  ip_set_uadd+0x16/0x20 [ip_set]
[76523.289173]  nfnetlink_rcv_msg+0x253/0x290 [nfnetlink]
[76523.289173]  ? nfnetlink_net_exit_batch+0x70/0x70 [nfnetlink]                                                                                                                                                                                [76523.289173]  netlink_rcv_skb+0x51/0x120
[76523.289173]  nfnetlink_rcv+0x88/0x145 [nfnetlink]
[76523.289173]  netlink_unicast+0x1a4/0x250
[76523.289173]  netlink_sendmsg+0x2eb/0x3f0
[76523.289173]  sock_sendmsg+0x63/0x70
[76523.289173]  __sys_sendto+0x13f/0x180
[76523.289173]  ? handle_mm_fault+0xb7/0x200
[76523.289173]  ? __do_page_fault+0x2b9/0x4c0
[76523.289173]  __x64_sys_sendto+0x28/0x30
[76523.289173]  do_syscall_64+0x5e/0x200
[76523.289173]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[76523.289173] RIP: 0033:0x7f6c0771e81a

And here's what drgn thinks:

$  drgn -c 202208251021/dump.202208251021 -s usr/lib/debug/boot/vmlinux-5.4.0-1025-azure -s usr/lib/debug/lib/modules/5.4.0-1025-azure/kernel/net/netfilter/ipset/ip_set_hash_netiface.ko -s usr/lib/debug/lib/modules/5.4.0-1025-azure/kernel/net/netfilter/ipset/ip_set.ko

drgn 0.0.20 (using Python 3.10.6, elfutils 0.187, with libkdumpfile)
could not get debugging information for:
kernel modules (could not read depmod: open: /lib/modules/5.4.0-1025-azure/modules.dep.bin: No such file or directory)
For help, type help(drgn).
>>> import drgn
>>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof
>>> from drgn.helpers.linux import *
>>> print(prog.crashed_thread().stack_trace())
#0  crash_setup_regs (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/include/asm/kexec.h:99)
#1  __crash_kexec (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/kernel/kexec_core.c:952)
#2  panic (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/kernel/panic.c:278)
#3  oops_end (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/kernel/dumpstack.c:354)
#4  no_context (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:848)
#5  __bad_area_nosemaphore (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:934)
#6  bad_area_nosemaphore (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:941)
#7  do_user_addr_fault (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:1339)
#8  __do_page_fault (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:1528)
#9  do_page_fault (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/mm/fault.c:1552)
#10 page_fault+0x39/0x3e (/build/linux-azure-5.4-Znn2aE/linux-azure-5.4-5.4.0/arch/x86/entry/entry_64.S:1204)

It looks like the CPU and PID both match dmesg:

>>> prog.crashed_thread().tid
3061
>>> prog["crashing_cpu"]
(int)7

[76523.209096] CPU: 7 PID: 3061 Comm: ipset Kdump: loaded Tainted: G           OE     5.4.0-1025-azure #25~18.04.1-Ubuntu

Any ideas? Thanks in advance

osandov commented 2 years ago

Huh, my guess is this is some sort of double fault (i.e., we faulted while handling whatever caused the original stack trace). I'm not sure why we didn't unwind past page_fault(), though. What was the actual cause of the first stack trace? A GPF or something else? And is this kernel using the ORC unwinder or the frame pointer unwinder?

danobi commented 2 years ago

What was the actual cause of the first stack trace? A GPF or something else?

Null ptr deref which was fixed in https://github.com/torvalds/linux/commit/2b33d6ffa9e38f344418976b06 .

And is this kernel using the ORC unwinder or the frame pointer unwinder?

Frame pointer unwinder. FWIW this was a ubuntu 5.4.0-1080-aws kernel.

brenns10 commented 1 year ago

@osandov I'm not sure if this is the same or related issue, but I've seen stack traces which don't have the full expected context as well, at least compared to what crash gives me. For example, here's the same backtrace from both crash and drgn:

crash7latest> bt -c 7
PID: 102485  TASK: ffff978d3b680000  CPU: 7   COMMAND: "ping"
 #0 [fffffe0000161e38] crash_nmi_callback at ffffffff9605ef57
 #1 [fffffe0000161e48] nmi_handle at ffffffff96033ecd
 #2 [fffffe0000161ea0] default_do_nmi at ffffffff96034416
 #3 [fffffe0000161ec8] do_nmi at ffffffff960345f6
 #4 [fffffe0000161ef0] end_repeat_nmi at ffffffff96a0476e
    [exception RIP: mlx5_eq_int+176]
    RIP: ffffffffc1e8d1c0  RSP: ffff9779807c3e50  RFLAGS: 00000046
    RAX: 0000000000000001  RBX: ffff971924e897c0  RCX: ffff9728e0438000
    RDX: 00000000086a225f  RSI: 0000000000000046  RDI: 0000000000000002
    RBP: ffff9779807c3eb8   R8: ffff977980fe78c0   R9: ffff96be22473d80
    R10: 00000000000000ed  R11: 000000000000b741  R12: 0000000000000001
    R13: ffff9722fa380060  R14: 0000000000000404  R15: ffff976cd2205700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffff9779807c3e50] mlx5_eq_int at ffffffffc1e8d1c0 [mlx5_core]
 #6 [ffff9779807c3ec0] __handle_irq_event_percpu at ffffffff96103979
 #7 [ffff9779807c3f10] handle_irq_event_percpu at ffffffff96103b22
 #8 [ffff9779807c3f40] handle_irq_event at ffffffff96103bb0
 #9 [ffff9779807c3f68] handle_edge_irq at ffffffff9610779b
#10 [ffff9779807c3f88] handle_irq at ffffffff96032c2b
#11 [ffff9779807c3fb8] do_IRQ at ffffffff96a065a0
--- <IRQ stack> ---
#12 [ffffa3863bc739d8] ret_from_intr at ffffffff96a00bc6
    [exception RIP: dump_stack+65]
    RIP: ffffffff96871690  RSP: ffffa3863bc73a88  RFLAGS: 00000202
    RAX: 000000000000001f  RBX: 0000000000000286  RCX: 0000000000000830
    RDX: 0000000000000007  RSI: 00000000000008fb  RDI: 0000000000000286
    RBP: ffffa3863bc73a98   R8: 0000000000000004   R9: ffffffff97a052e9
    R10: 00078e561ecaf45e  R11: 0000000000000000  R12: 00000000ffffffff
    R13: 0000000000000019  R14: 000000000000001c  R15: 0000000000022970
    ORIG_RAX: ffffffffffffff39  CS: 0010  SS: 0018
#13 [ffffa3863bc73aa0] __csd_lock_wait at ffffffff96131af5
#14 [ffffa3863bc73b20] smp_call_function_many at ffffffff9613269a
#15 [ffffa3863bc73b78] on_each_cpu at ffffffff961327bd
#16 [ffffa3863bc73ba0] flush_tlb_kernel_range at ffffffff96085c4b
#17 [ffffa3863bc73be0] __purge_vmap_area_lazy at ffffffff96239f30
#18 [ffffa3863bc73c08] vm_unmap_aliases at ffffffff9623a403
#19 [ffffa3863bc73c50] change_page_attr_set_clr at ffffffff96081408
#20 [ffffa3863bc73cf8] set_memory_ro at ffffffff960824af
#21 [ffffa3863bc73d18] bpf_int_jit_compile at ffffffff96095870
#22 [ffffa3863bc73d90] bpf_prog_select_runtime at ffffffff961b0fbc
#23 [ffffa3863bc73db0] bpf_prepare_filter at ffffffff96759637
#24 [ffffa3863bc73e00] __get_filter at ffffffff96759961
#25 [ffffa3863bc73e40] sk_attach_filter at ffffffff96759e28
#26 [ffffa3863bc73e68] sock_setsockopt at ffffffff9671cc3c
#27 [ffffa3863bc73ee0] sys_setsockopt at ffffffff96717513
#28 [ffffa3863bc73f28] do_syscall_64 at ffffffff96003ca9
#29 [ffffa3863bc73f50] entry_SYSCALL_64_after_hwframe at ffffffff96a001b1
    RIP: 00007f03f15acc4a  RSP: 00007ffd8c526a78  RFLAGS: 00000203
    RAX: ffffffffffffffda  RBX: 00007ffd8c526ae0  RCX: 00007f03f15acc4a
    RDX: 000000000000001a  RSI: 0000000000000001  RDI: 0000000000000006
    RBP: 00007ffd8c528100   R8: 0000000000000010   R9: 0000000000003736
    R10: 000055c6b42fa0a0  R11: 0000000000000203  R12: 00007ffd8c526b20
    R13: 000055c6b42fa0c0  R14: 0000001d00000001  R15: 000055c6b42fc6bc
    ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b

And from drgn:

>>> prog.stack_trace(102485)
#0  next_eqe_sw (/home/staging/linux-uek/drivers/net/ethernet/mellanox/mlx5/core//eq.c:111)
#1  mlx5_eq_int (/home/staging/linux-uek/drivers/net/ethernet/mellanox/mlx5/core//eq.c:414)
#2  __handle_irq_event_percpu (kernel/irq/handle.c:147)
#3  handle_irq_event_percpu (kernel/irq/handle.c:187)
#4  handle_irq_event (kernel/irq/handle.c:204)
#5  handle_edge_irq (kernel/irq/chip.c:793)
#6  generic_handle_irq_desc (include/linux/irqdesc.h:162)
#7  handle_irq (arch/x86/kernel/irq_64.c:87)
#8  do_IRQ (arch/x86/kernel/irq.c:246)
#9  common_interrupt+0x1c6/0x382 (arch/x86/entry/entry_64.S:590)

I can understand not having the NMI portion... it's just the call stack to get to the point of either kdump or halting a CPU during a crash. But the missing system call portion is a bother, since I do care very much about what the kernel was doing before it received an interrupt :)

Again, not sure if it's the same underlying issue, but it seems similar.

brenns10 commented 1 year ago

(Of course, the drgn stack trace is much nicer than crash's due to the unwinding of inline functions, which I love to see!)

brenns10 commented 1 year ago

Now that #179 is merged, I see a nice solution to this. I find that these sorts of "truncated stacks" appear at interrupt/fault boundaries. We usually see a struct pt_regs * variable in the stack trace. Now that' it's easy to automatically find variables in a stack trace, a helper could simply keep a look out for the last struct pt_regs * variable, and if found, it can do prog.stack_trace(pt_regs) to continue the trace.

marxin commented 1 year ago

Btw. would it be possible to learn drgn how to print NMI exception stack and IRQ stack?

osandov / drgn

dmesg stack trace does not match `prog.crashed_thread().stack_trace()` #206