opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.38k stars 757 forks source link

Immediate kernel panic using OPNSense 24.7 on system with Chelsio T320. System operates correctly running 24.1 #8005

Closed dttocs closed 1 month ago

dttocs commented 1 month ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

OPNSense 24.7 crashes on a system with Chelsio T320 card installed. It does not crash running 24.1.10_8 or FreeBSD 14.1

To Reproduce

Steps to reproduce the behavior:

  1. Boot from OPNSense 24.7 Memory Stick (created from OPNsense-24.7-vga-amd64.img)
  2. Observe kernel panic and reboot
  3. Boot from FreeBSD 14.1 Memory Stick (created from FreeBSD-14.1-RELEASE-amd64-memstick.img)
  4. Observe successful boot

Expected behavior

The system will boot successfully and the two ports on the T320 will be usable for WAN and LAN connections.

Describe alternatives you considered

Initially tried in-place upgrade from 24.1.10_8 using "Update from console". That failed, but I had created a backup boot environment using bectl, and was able to verify the system still operated when reverting to 24.1.10_8.

Screenshots

n/a

Relevant log files

cxgbc0: <Chelsio T320, 2 ports> mem 0xd1000000-0xd1000fff,0xd1001000-0xd1001fff irq 16 at device 0.0 on pci1
cxgbc0: using MSI-X interrupts (9 vectors)
cxgb0: <Port 0 10GBASE-R> on cxgbc0

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe00aab186a8
frame pointer           = 0x28:0xfffffe00aab186d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 390 (devctl)
rdi: fffff80006bb2800 rsi: fffffe00aab18720 rdx: fffffe0091153ed8
rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000000
rax: 0000000000000000 rbx: fffffe00aab18720 rbp: fffffe00aab186d0
r10: fffff8006686a000 r11: 0000000001b0416b r12: 0000000000008802
r13: fffff8006686a010 r14: fffffe0091153ed8 r15: 0000000000000000
trap number             = 12
panic: page fault
cpuid = 1
time = 1729622293
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aab18390
vpanic() at vpanic+0x131/frame 0xfffffe00aab184c0
panic() at panic+0x43/frame 0xfffffe00aab18520
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00aab18580
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00aab185d0
calltrap() at calltrap+0x8/frame 0xfffffe00aab185d0
--- trap 0xc, rip = 0, rsp = 0xfffffe00aab186a8, rbp = 0xfffffe00aab186d0 ---
??() at 0/frame 0xfffffe00aab186d0
dump_iface() at dump_iface+0x145/frame 0xfffffe00aab18780
rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe00aab18800
if_attach_internal() at if_attach_internal+0x3df/frame 0xfffffe00aab18850
ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe00aab18890
cxgb_port_attach() at cxgb_port_attach+0x1d3/frame 0xfffffe00aab188d0
device_attach() at device_attach+0x3ac/frame 0xfffffe00aab18920
bus_generic_attach() at bus_generic_attach+0x4b/frame 0xfffffe00aab18950
cxgb_controller_attach() at cxgb_controller_attach+0x97f/frame 0xfffffe00aab18a10
device_attach() at device_attach+0x3ac/frame 0xfffffe00aab18a60
device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe00aab18a90
pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe00aab18ad0
devclass_driver_added() at devclass_driver_added+0x29/frame 0xfffffe00aab18b00
device_do_deferred_actions() at device_do_deferred_actions+0x3b/frame 0xfffffe00aab18b20
devctl2_ioctl() at devctl2_ioctl+0x20f/frame 0xfffffe00aab18bf0
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfffffe00aab18c40
vn_ioctl() at vn_ioctl+0xce/frame 0xfffffe00aab18cb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00aab18cd0
kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00aab18d40
sys_ioctl() at sys_ioctl+0xff/frame 0xfffffe00aab18e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00aab18f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00aab18f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x3549e30c55fa, rsp = 0x3549e27017e8, rbp = 0x3549e27018a0 ---
KDB: enter: panic

Additional context

Environment

Software version used and hardware type:

OPNsense 24.7 (amd64). Intel(R) Core(TM) i5-6500 CPU Chelsio T320

fichtner commented 1 month ago

I would kindly ask you to avoid spamming. I already responded here https://github.com/opnsense/src/issues/224

dttocs commented 1 month ago

Terribly sorry. I couldn't find the issue I opened yesterday, and thought I had failed to submit it.

fichtner commented 1 month ago

No worries. We can discuss how to debug this in the other thread. It's a bit challenging during boot but it can be done. We also have debug kernels.