opnsense / src

OPNsense operating system on top of FreeBSD
https://opnsense.org/
Other
359 stars 151 forks source link

Netmap kernel crash 24.7_5 running on XCP 8.2.1 (Xen 4.13) #211

Closed andrew64k closed 2 months ago

andrew64k commented 2 months ago

Describe the bug Kernel crash with OPNsense 24.7_5 running on XCP 8.2.1 (Xen) Seems to work until AFTER the generic_netmap_ messages.

To Reproduce Boot OPNsense, let it start. Wait for after the generic_netmap messages Access the management web page (and some sub pages) crash...reboot...repeat...

Expected behavior It does not crash...

Screenshots opnsense_netmap opnsense_crash

Relevant log files

Dump header from device: /dev/ada1s1
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 73216
  Blocksize: 512
  Compression: none
  Dumptime: 2024-07-28 20:58:52 -0400
  Hostname: OPNsense-Test1.localdomain
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 14.1-RELEASE-p2 stable/24.7-n267758-4ad7ad40bc77 SMP
  Panic String: page fault
  Dump Parity: 1755970049
  Bounds: 0
  Dump Status: good

Crash dump file: textdump.tar.gz

Additional context Thanks for the hard work! Looks like a FreeBSD problem.

Environment Software version used and hardware type if relevant, e.g.: Working normally with OPNsense 24.1.10_8 Crashes with OPNsense 24.7_5

Running as HVM on XCP 8.2.1 (Xen 4.13.5) Intel E5-2680 v2 Xen Virtual Net

fichtner commented 2 months ago

Also reported here https://forum.opnsense.org/index.php?topic=41757.msg205214#msg205214

fichtner commented 2 months ago

Problem actually appears to be xen(4) only.

fichtner commented 2 months ago

Could be related to 979bb7ac144

fichtner commented 2 months ago

Kernel to try:

# opnsense-update -zkr 24.7_7

Don't forget to reboot.

Cheers, Franco

andrew64k commented 2 months ago

Still crashes after 24.7_7 kernel update.

kotashiratsuka commented 2 months ago

Similarly, even if you update to _7, you will still get a kernel panic with HVM on XCP-ng.

fichtner commented 2 months ago

Stack trace please to confirm. I'm relatively certain this is a FreeBSD issue.

andrew64k commented 2 months ago
Dump header from device: /dev/ada1s1
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 72704
  Blocksize: 512
  Compression: none
  Dumptime: 2024-07-29 09:27:55 -0400
  Hostname: OPNsense-Test1.localdomain
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 14.1-RELEASE-p2 stable/24.7-n267765-b269437501d8 SMP
  Panic String: page fault
  Dump Parity: 4225216080
  Bounds: 3
  Dump Status: good

593.140339 [1167] generic_netmap_attach     Emulated adapter for xn0 created (prev was NULL)
593.149593 [1072] generic_netmap_dtor       Emulated netmap adapter for xn0 destroyed
593.158630 [1167] generic_netmap_attach     Emulated adapter for xn0 created (prev was NULL)
593.167763 [1072] generic_netmap_dtor       Emulated netmap adapter for xn0 destroyed
593.176779 [1167] generic_netmap_attach     Emulated adapter for xn0 created (prev was NULL)
593.195082 [1072] generic_netmap_dtor       Emulated netmap adapter for xn0 destroyed
593.206795 [1167] generic_netmap_attach     Emulated adapter for xn0 created (prev was NULL)
593.373322 [ 319] generic_netmap_register   Emulated adapter for xn0 activated

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x30
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80a0f08f
stack pointer           = 0x28:0xfffffe007a8db8e0
frame pointer           = 0x28:0xfffffe007a8db970
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 34371 (W#01-xn0^)
rdi: fffff80001fce000 rsi: fffff80001f02e00 rdx: fffff80001f02e00
rcx: fffff8000154b000  r8: 00000000000000e6  r9: 0000000000000800
rax: 00000000000000ff rbx: fffffe0067fab000 rbp: fffffe007a8db970
r10: 0000000000000301 r11: fffff800687f9520 r12: 0000000000000000
r13: fffff800015c3000 r14: fffffe007a8db944 r15: fffff80001f02e00
trap number     = 12
panic: page fault
cpuid = 1
time = 1722259675
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe007a8db5d0
vpanic() at vpanic+0x131/frame 0xfffffe007a8db700
panic() at panic+0x43/frame 0xfffffe007a8db760
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe007a8db7c0
trap_pfault() at trap_pfault+0x46/frame 0xfffffe007a8db810
calltrap() at calltrap+0x8/frame 0xfffffe007a8db810
--- trap 0xc, rip = 0xffffffff80a0f08f, rsp = 0xfffffe007a8db8e0, rbp = 0xfffffe007a8db970 ---
xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0xdf/frame 0xfffffe007a8db970
xn_txq_mq_start() at xn_txq_mq_start+0x76/frame 0xfffffe007a8db9a0
nm_os_generic_xmit_frame() at nm_os_generic_xmit_frame+0xa0/frame 0xfffffe007a8db9f0
generic_netmap_txsync() at generic_netmap_txsync+0x3a2/frame 0xfffffe007a8dbae0
netmap_ioctl() at netmap_ioctl+0x1a7/frame 0xfffffe007a8dbbb0
freebsd_netmap_ioctl() at freebsd_netmap_ioctl+0x79/frame 0xfffffe007a8dbbf0
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfffffe007a8dbc40
vn_ioctl() at vn_ioctl+0xce/frame 0xfffffe007a8dbcb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe007a8dbcd0
kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe007a8dbd40
sys_ioctl() at sys_ioctl+0xff/frame 0xfffffe007a8dbe00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe007a8dbf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe007a8dbf30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x82e0505fa, rsp = 0x8320d2df8, rbp = 0x8320d2e20 ---
KDB: enter: panic

textdump.tar.gz

fichtner commented 2 months ago

Test patch 3f1850fd8 via @markjdb

Test kernel installs as follows:

REDACTED

Feedback is highly appreciated :)

Cheers, Franco

andrew64k commented 2 months ago

Update installed. Reboot. Way worse.... as soon as it finishes booting it crashed. I can't even login. It does not even have time to do a crash dump. opnsense_crash2

fichtner commented 2 months ago

Thanks. Revoked kernel for now and will pass this along.

fichtner commented 2 months ago

617c782a35a was missing from previous, new kernel here:

REDACTED

Cheers, Franco

andrew64k commented 2 months ago

24.7-xen2 Installed, rebooted. It's working for me. No crashes during an hour of testing.

Thanks!

fichtner commented 2 months ago

One more kernel, now following a simpler final commit already in FreeBSD https://cgit.freebsd.org/src/commit/?id=2e4781cb12a

# opnsense-update -zkr 24.7-xen3

Unless there is bad feedback about this 3rd iteration I'll ship this particular fix in 24.7.1 and close the issue.

Cheers, Franco

andrew64k commented 2 months ago

Updated. Reboot. Ran traffic for an hour and still working.

A2sti commented 2 months ago

I did opnsense-update -zkr 24.7-xen3. It's already 3 hours old and it's working with zenarmor and suricata without restarting.

fichtner commented 2 months ago

@A2sti awesome, thanks!

kotashiratsuka commented 2 months ago

No kernel panic occurs with 24.7-xen3 Thank you

fichtner commented 2 months ago

@kotashiratsuka also in 24.7.1 now :)