opnsense / src

OPNsense operating system on top of FreeBSD
https://opnsense.org/
Other
374 stars 156 forks source link

Kernel Panic Mode when sctp goes via a interface and IPSec #227

Open snorlaxrino opened 3 weeks ago

snorlaxrino commented 3 weeks ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

opnsense crashes, it seems to have something to do with sctp and a vpn. After some time we suspect that an object might be null or incorrectly filled -> https://github.com/opnsense/src/blob/stable/24.7/sys/netpfil/pf/pf.c#L7944 But it seems to occur only in combination with VPN and sctp, I had 2 test IPSec site to site and OpenVPN TAP, with both VPNs the problem occurred at the same place. Only when the VPN is deactivated does the error not occur. Didn't use this before 24.7.

To Reproduce

these sctp packets go through an IPSec tunnel, as soon as I activate the tunnel the OPNsense crashes. After a restart, the OPNsense runs for about 15 minutes until it crashes again. The VPN is site to site.

Expected behavior

No kernel panic mode in this case.

Relevant log files

--- trap 0xc, rip = 0xffffffff821ab744, rsp = 0xfffffe00625cef50, rbp = 0xfffffe00625cef50 --- pfi_kkif_match() at pfi_kkif_match+0x24/frame 0xfffffe00625cef50 pf_test_rule() at pf_test_rule+0xe6b/frame 0xfffffe00625cf3a0 pf_sctp_multihome_delayed() at pf_sctp_multihome_delayed+0x30e/frame 0xfffffe00625cf4d0 pf_test() at pf_test+0xd1a/frame 0xfffffe00625cf680 pf_check_in() at pf_check_in+0x27/frame 0xfffffe00625cf6a0 pfil_mbuf_in() at pfil_mbuf_in+0x38/frame 0xfffffe00625cf6d0 enc_hhook() at enc_hhook+0x28a/frame 0xfffffe00625cf710 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe00625cf780 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe00625cf7a0 ipsec4_common_input_cb() at ipsec4_common_input_cb+0x32a/frame 0xfffffe00625cf830 esp_input_cb() at esp_input_cb+0x430/frame 0xfffffe00625cf8e0 swcr_process() at swcr_process+0x25/frame 0xfffffe00625cf900 crypto_dispatch() at crypto_dispatch+0x60/frame 0xfffffe00625cf920 esp_input() at esp_input+0x4d8/frame 0xfffffe00625cf9f0 udp_ipsec_input() at udp_ipsec_input+0x17b/frame 0xfffffe00625cfa50 ipsec_kmod_udp_input() at ipsec_kmod_udp_input+0x2d/frame 0xfffffe00625cfa70 udp_append() at udp_append+0xe4/frame 0xfffffe00625cfae0 udp_input() at udp_input+0x803/frame 0xfffffe00625cfbc0 ip_input() at ip_input+0x268/frame 0xfffffe00625cfc20 netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00625cfc70 ether_demux() at ether_demux+0x149/frame 0xfffffe00625cfca0 ether_nh_input() at ether_nh_input+0x36a/frame 0xfffffe00625cfd00 netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00625cfd50 ether_input() at ether_input+0x56/frame 0xfffffe00625cfda0 re_rxeof() at re_rxeof+0x547/frame 0xfffffe00625cfe20 re_intr_msi() at re_intr_msi+0xf3/frame 0xfffffe00625cfe60 ithread_loop() at ithread_loop+0x257/frame 0xfffffe00625cfef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe00625cff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00625cff30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic panic.txt0600001214714623512 7136 ustarrootwheelpage faultversion.txt0600007414714623512 7541 ustarrootwheelFreeBSD 14.1-RELEASE-p6 stable/24.7-n267939-fd5bc7f34e1 SMP

Additional context

Uploaded through crash reporter

Environment

OPNsense 24.7.8-amd64 FreeBSD 14.1-RELEASE-p6 OpenSSL 3.0.15 AMD G-T40E Processor (2 cores, 2 threads)

fichtner commented 3 weeks ago

Feel free to send me a vmcore file from a debug kernel crash:

# opnsense-update -zkr dbg-24.7.8 && opnsense-shell reboot

That being said SCTP being unreliable is clear FreeBSD territory. There are no relevant commits on stable/14 to my knowledge.

Cheers, Franco

snorlaxrino commented 3 weeks ago

Hey Franco, the vmcore0 is to big to upload here. I can upload it somewhere if you have something available, otherwise I can share it via Onedrive. Here is an extract that may help. Or you can also tell me exactly what information you need.

Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff82388a0d stack pointer = 0x28:0xfffffe006259ae80 frame pointer = 0x28:0xfffffe006259ae90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq28: re1) rdi: ffffffff81ddc480 rsi: 0000000000000000 rdx: fffff80034308078 rcx: 0000000000000000 r8: 00000000ffffffdb r9: 0000000000000010 rax: 0000000000000001 rbx: fffff80005fe2600 rbp: fffffe006259ae90 r10: 0000000000000000 r11: 0000000000000000 r12: fffff80005b1ee00 r13: fffff8009eb4eb10 r14: fffff80005b1ee00 r15: fffff800035b7740 trap number = 12 panic: page fault cpuid = 1 time = 1731418088 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006259ab70 vpanic() at vpanic+0x131/frame 0xfffffe006259aca0 panic() at panic+0x43/frame 0xfffffe006259ad00 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe006259ad60 trap_pfault() at trap_pfault+0x57/frame 0xfffffe006259adb0 calltrap() at calltrap+0x8/frame 0xfffffe006259adb0 --- trap 0xc, rip = 0xffffffff82388a0d, rsp = 0xfffffe006259ae80, rbp = 0xfffffe006259ae90 --- pfi_kkif_match() at pfi_kkif_match+0x3d/frame 0xfffffe006259ae90 pf_test_rule() at pf_test_rule+0xe43/frame 0xfffffe006259b2d0 pf_sctp_multihome_delayed() at pf_sctp_multihome_delayed+0x314/frame 0xfffffe006259b400 pf_test() at pf_test+0x10f9/frame 0xfffffe006259b5b0 pf_check_in() at pf_check_in+0x27/frame 0xfffffe006259b5d0 pfil_mbuf_in() at pfil_mbuf_in+0x58/frame 0xfffffe006259b610 enc_hhook() at enc_hhook+0x28a/frame 0xfffffe006259b650 hhook_run_hooks() at hhook_run_hooks+0x6f/frame 0xfffffe006259b6c0 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe006259b6e0 ipsec4_common_input_cb() at ipsec4_common_input_cb+0x3e4/frame 0xfffffe006259b770 esp_input_cb() at esp_input_cb+0x5bd/frame 0xfffffe006259b830 swcr_process() at swcr_process+0x25/frame 0xfffffe006259b850 crypto_invoke() at crypto_invoke+0x7c/frame 0xfffffe006259b8c0 crypto_dispatch_one() at crypto_dispatch_one+0xf4/frame 0xfffffe006259b8f0 esp_input() at esp_input+0x57e/frame 0xfffffe006259b9c0 udp_ipsec_input() at udp_ipsec_input+0x197/frame 0xfffffe006259ba20 ipsec_kmod_udp_input() at ipsec_kmod_udp_input+0x2d/frame 0xfffffe006259ba40 udp_append() at udp_append+0x112/frame 0xfffffe006259bab0 udp_input() at udp_input+0x823/frame 0xfffffe006259bba0 ip_input() at ip_input+0x2e0/frame 0xfffffe006259bc00 netisr_dispatch_src() at netisr_dispatch_src+0xae/frame 0xfffffe006259bc60 ether_demux() at ether_demux+0x179/frame 0xfffffe006259bc90 ether_nh_input() at ether_nh_input+0x3e9/frame 0xfffffe006259bce0 netisr_dispatch_src() at netisr_dispatch_src+0xae/frame 0xfffffe006259bd40 ether_input() at ether_input+0x155/frame 0xfffffe006259bda0 re_rxeof() at re_rxeof+0x575/frame 0xfffffe006259be20 re_intr_msi() at re_intr_msi+0xc3/frame 0xfffffe006259be60 ithread_loop() at ithread_loop+0x256/frame 0xfffffe006259bef0 fork_exit() at fork_exit+0x82/frame 0xfffffe006259bf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006259bf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic

Best Regard Richi

fichtner commented 3 weeks ago

Hey Richi,

Yeah, the vmcore is last resort. Can you share via onedrive, just drop me a line at franco@opnsense.org -- highly appreciated!

Cheers, Franco

snorlaxrino commented 2 weeks ago

Hello Franco,

just to make it sure, did you receive my link through mail?

Best regards, Richi

fichtner commented 2 weeks ago

Hi Richi,

Thanks for following up. Did not receive an email indeed. Can you try to resend?

Thanks, Franco

fichtner commented 2 weeks ago

Got it now, thanks!

fichtner commented 2 weeks ago

Ok I think this is involved in the NULL dereference happening here:

https://github.com/opnsense/src/commit/38663ae5ccc2b83

If you set Firewall: Settings: Advanced: Bind states to interface -- do the crashes still occur?

Cheers, Franco

snorlaxrino commented 2 weeks ago

Hi Franco, still crashes. I send you through mail new dumb. Best Regards Richi

fichtner commented 2 weeks ago

Hi Richi,

Can you try this kernel? It is an immediate fix to the crash location but I'm not sure if the larger issue appears somewhere else afterwards:

# opnsense-update -zkr 24.7.8-sctp

Cheers, Franco

snorlaxrino commented 2 weeks ago

Hi Franco,

looks like you find a solution. Atleast it didn't crashed. I will let it run and give a update in couple of hours.

Best Regards Richi

snorlaxrino commented 2 weeks ago

Hi Franco, no crashes till now. Looks good! Thanks!

Best Regards, Richi

fichtner commented 2 weeks ago

Thanks for reporting back -- we will include the operational fix and discuss options for FreeBSD here https://reviews.freebsd.org/D47658

I'll close the ticket. I don't think 24.7.9 will have a new kernel so the likely release timeframe for this one is 24.7.10.

Cheers, Franco

fichtner commented 7 hours ago

upstream commit https://github.com/freebsd/freebsd-src/commit/c22c987984 needs backport at some point