opnsense / src

OPNsense operating system on top of FreeBSD
https://opnsense.org/
Other
351 stars 149 forks source link

Kernel panic on Protectli FW4C #197

Closed ronin3510 closed 3 months ago

ronin3510 commented 7 months ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug Just found out this FW started rebooting "5-6 days ago"

A quick look in the logs show the kernel panic started early in the New Year - when it was running on 23.7.10 K&B.

Unaware of this issue I moved it to 24.1.r_70 on Friday - where it's currently at. Kernel panics continue nevertheless.

A clear and concise description of what the bug is, including last known working version (if any).

It as been running fine on 23.7.10 until January - unsure what triggers the panics suddenly...appears to be a fragmentation issue on IPv6 ?

2024-01-09T09:02:31 Notice kernel KDB: enter: panic 2024-01-09T09:02:31 Notice kernel panic() at panic+0x43/frame 0xfffffe00107db320 2024-01-09T09:02:31 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107db2c0 2024-01-09T09:02:31 Notice kernel panic: page fault 2024-01-09T08:44:06 Notice kernel KDB: enter: panic 2024-01-09T08:44:06 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795510 2024-01-09T08:44:06 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107954b0 2024-01-09T08:44:06 Notice kernel panic: page fault 2024-01-09T08:42:28 Notice kernel KDB: enter: panic 2024-01-09T08:42:28 Notice kernel panic() at panic+0x43/frame 0xfffffe001079a320 2024-01-09T08:42:28 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe001079a2c0 2024-01-09T08:42:28 Notice kernel panic: page fault 2024-01-09T08:37:01 Notice kernel KDB: enter: panic 2024-01-09T08:37:01 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795510 2024-01-09T08:37:01 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107954b0 2024-01-09T08:37:01 Notice kernel panic: page fault 2024-01-09T08:35:17 Notice kernel KDB: enter: panic 2024-01-09T08:35:17 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795510 2024-01-09T08:35:17 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107954b0 2024-01-09T08:35:17 Notice kernel panic: page fault 2024-01-09T08:33:36 Notice kernel KDB: enter: panic 2024-01-09T08:33:36 Notice kernel panic() at panic+0x43/frame 0xfffffe00107db510 2024-01-09T08:33:36 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107db4b0 2024-01-09T08:33:36 Notice kernel panic: page fault 2024-01-09T08:31:56 Notice kernel KDB: enter: panic 2024-01-09T08:31:56 Notice kernel panic() at panic+0x43/frame 0xfffffe00107e0320 2024-01-09T08:31:56 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107e02c0 2024-01-09T08:31:56 Notice kernel panic: page fault 2024-01-09T08:29:17 Notice kernel KDB: enter: panic 2024-01-09T08:29:17 Notice kernel panic() at panic+0x43/frame 0xfffffe001079a320 2024-01-09T08:29:17 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe001079a2c0 2024-01-09T08:29:17 Notice kernel panic: page fault 2024-01-09T08:23:02 Notice kernel KDB: enter: panic 2024-01-09T08:23:02 Notice kernel panic() at panic+0x43/frame 0xfffffe00107db320 2024-01-09T08:23:02 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107db2c0 2024-01-09T08:23:02 Notice kernel panic: page fault 2024-01-09T08:16:01 Notice kernel KDB: enter: panic 2024-01-09T08:16:01 Notice kernel panic() at panic+0x43/frame 0xfffffe00107db320 2024-01-09T08:16:01 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107db2c0 2024-01-09T08:16:01 Notice kernel panic: page fault 2024-01-09T08:07:49 Notice kernel KDB: enter: panic 2024-01-09T08:07:49 Notice kernel panic() at panic+0x43/frame 0xfffffe001079a510 2024-01-09T08:07:49 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe001079a4b0 2024-01-09T08:07:49 Notice kernel panic: page fault 2024-01-09T08:06:07 Notice kernel KDB: enter: panic 2024-01-09T08:06:07 Notice kernel panic() at panic+0x43/frame 0xfffffe001079a320 2024-01-09T08:06:07 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe001079a2c0 2024-01-09T08:06:07 Notice kernel panic: page fault 2024-01-09T08:04:03 Notice kernel KDB: enter: panic 2024-01-09T08:04:03 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795320 2024-01-09T08:04:03 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107952c0 2024-01-09T08:04:03 Notice kernel panic: page fault 2024-01-09T08:01:50 Notice kernel KDB: enter: panic 2024-01-09T08:01:50 Notice kernel panic() at panic+0x43/frame 0xfffffe00107e0320 2024-01-09T08:01:50 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107e02c0 2024-01-09T08:01:50 Notice kernel panic: page fault 2024-01-09T07:34:17 Notice kernel KDB: enter: panic 2024-01-09T07:34:17 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795320 2024-01-09T07:34:17 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107952c0 2024-01-09T07:34:17 Notice kernel panic: page fault 2024-01-09T07:28:48 Notice kernel KDB: enter: panic 2024-01-09T07:28:48 Notice kernel panic() at panic+0x43/frame 0xfffffe00107db320 2024-01-09T07:28:48 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107db2c0 2024-01-09T07:28:48 Notice kernel panic: page fault 2024-01-06T10:45:22 Notice kernel KDB: enter: panic 2024-01-06T10:45:22 Notice kernel panic() at panic+0x43/frame 0xfffffe0010795510 2024-01-06T10:45:22 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107954b0 2024-01-06T10:45:22 Notice kernel panic: page fault 2024-01-06T10:43:43 Notice kernel KDB: enter: panic 2024-01-06T10:43:42 Notice kernel panic() at panic+0x43/frame 0xfffffe00107e0320 2024-01-06T10:43:42 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107e02c0 2024-01-06T10:43:42 Notice kernel panic: page fault 2024-01-05T10:03:56 Notice kernel KDB: enter: panic 2024-01-05T10:03:56 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d6530 2024-01-05T10:03:56 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d64d0 2024-01-05T10:03:56 Notice kernel panic: page fault 2024-01-05T10:02:20 Notice kernel KDB: enter: panic 2024-01-05T10:02:20 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc340 2024-01-05T10:02:20 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc2e0 2024-01-05T10:02:20 Notice kernel panic: page fault 2024-01-05T09:07:05 Notice kernel KDB: enter: panic 2024-01-05T09:07:05 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc340 2024-01-05T09:07:05 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc2e0 2024-01-05T09:07:05 Notice kernel panic: page fault 2024-01-04T21:52:37 Notice kernel KDB: enter: panic 2024-01-04T21:52:37 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d6340 2024-01-04T21:52:37 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d62e0 2024-01-04T21:52:37 Notice kernel panic: page fault 2024-01-04T12:56:30 Notice kernel KDB: enter: panic 2024-01-04T12:56:30 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1340 2024-01-04T12:56:30 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d12e0 2024-01-04T12:56:30 Notice kernel panic: page fault 2024-01-04T12:20:19 Notice kernel KDB: enter: panic 2024-01-04T12:20:19 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc340 2024-01-04T12:20:19 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc2e0 2024-01-04T12:20:19 Notice kernel panic: page fault 2024-01-04T12:15:26 Notice kernel KDB: enter: panic 2024-01-04T12:15:26 Notice kernel panic() at panic+0x43/frame 0xfffffe00107c7340 2024-01-04T12:15:26 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107c72e0 2024-01-04T12:15:26 Notice kernel panic: page fault 2024-01-04T09:21:39 Notice kernel KDB: enter: panic 2024-01-04T09:21:39 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1530 2024-01-04T09:21:39 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d14d0 2024-01-04T09:21:39 Notice kernel panic: page fault 2024-01-04T09:19:59 Notice kernel KDB: enter: panic 2024-01-04T09:19:59 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc340 2024-01-04T09:19:59 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc2e0 2024-01-04T09:19:59 Notice kernel panic: page fault 2024-01-02T08:12:11 Notice kernel KDB: enter: panic 2024-01-02T08:12:11 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d6340 2024-01-02T08:12:11 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d62e0 2024-01-02T08:12:11 Notice kernel panic: page fault 2024-01-02T08:09:34 Notice kernel KDB: enter: panic 2024-01-02T08:09:34 Notice kernel panic() at panic+0x43/frame 0xfffffe00107c7340 2024-01-02T08:09:34 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107c72e0 2024-01-02T08:09:34 Notice kernel panic: page fault 2024-01-02T07:23:50 Notice kernel KDB: enter: panic 2024-01-02T07:23:50 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1340 2024-01-02T07:23:50 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d12e0 2024-01-02T07:23:50 Notice kernel panic: page fault 2024-01-02T07:18:17 Notice kernel KDB: enter: panic 2024-01-02T07:18:17 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1530 2024-01-02T07:18:17 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d14d0 2024-01-02T07:18:17 Notice kernel panic: page fault 2024-01-02T07:16:34 Notice kernel KDB: enter: panic 2024-01-02T07:16:34 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc530 2024-01-02T07:16:34 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc4d0 2024-01-02T07:16:34 Notice kernel panic: page fault 2024-01-02T07:14:55 Notice kernel KDB: enter: panic 2024-01-02T07:14:55 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1340 2024-01-02T07:14:55 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d12e0 2024-01-02T07:14:55 Notice kernel panic: page fault 2024-01-02T01:46:33 Notice kernel KDB: enter: panic 2024-01-02T01:46:33 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d6340 2024-01-02T01:46:33 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d62e0 2024-01-02T01:46:33 Notice kernel panic: page fault 2024-01-01T10:44:41 Notice kernel KDB: enter: panic 2024-01-01T10:44:41 Notice kernel panic() at panic+0x43/frame 0xfffffe00107cc340 2024-01-01T10:44:41 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107cc2e0 2024-01-01T10:44:41 Notice kernel panic: page fault 2024-01-01T04:44:07 Notice kernel KDB: enter: panic 2024-01-01T04:44:07 Notice kernel panic() at panic+0x43/frame 0xfffffe00107d1340 2024-01-01T04:44:07 Notice kernel vpanic() at vpanic+0x151/frame 0xfffffe00107d12e0 2024-01-01T04:44:07 Notice kernel panic: page fault

To Reproduce

Found this in dmesg

`smbus0: on ichsmb0 lo0: link state changed to UP coretemp0: on cpu0 pflog0: permanently promiscuous mode enabled igc1: link state changed to DOWN igc0: link state changed to DOWN igc0: link state changed to UP igc1: link state changed to UP gif0: link state changed to UP tun1: changing name to 'ovpnc1' wg0: link state changed to UP tun2: changing name to 'ovpns2' ovpns2: link state changed to UP [fib_algo] inet.0 (bsearch4#27) rebuild_fd_flm: switching algo to radix4_lockless wg0: link state changed to DOWN wg0: link state changed to UP

Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ea41bc stack pointer = 0x0:0xfffffe00107db6a0 frame pointer = 0x0:0xfffffe00107db7c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_3) trap number = 12 panic: page fault cpuid = 3 time = 1704781932 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00107db460 vpanic() at vpanic+0x151/frame 0xfffffe00107db4b0 panic() at panic+0x43/frame 0xfffffe00107db510 trap_fatal() at trap_fatal+0x387/frame 0xfffffe00107db570 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00107db5d0 calltrap() at calltrap+0x8/frame 0xfffffe00107db5d0 --- trap 0xc, rip = 0xffffffff80ea41bc, rsp = 0xfffffe00107db6a0, rbp = 0xfffffe00107db7c0 --- ip6_forward() at ip6_forward+0x60c/frame 0xfffffe00107db7c0 pf_refragment6() at pf_refragment6+0x14f/frame 0xfffffe00107db810 pf_test6() at pf_test6+0x12c0/frame 0xfffffe00107db9a0 pf_check6_out() at pf_check6_out+0x40/frame 0xfffffe00107db9d0 pfil_run_hooks() at pfil_run_hooks+0x97/frame 0xfffffe00107dba10 ip6_tryforward() at ip6_tryforward+0x2ce/frame 0xfffffe00107dba90 ip6_input() at ip6_input+0x5e4/frame 0xfffffe00107dbb70 netisr_dispatch_src() at netisr_dispatch_src+0x295/frame 0xfffffe00107dbbc0 ether_demux() at ether_demux+0x159/frame 0xfffffe00107dbbf0 ether_nh_input() at ether_nh_input+0x36b/frame 0xfffffe00107dbc50 netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00107dbca0 ether_input() at ether_input+0x69/frame 0xfffffe00107dbd00 iflib_rxeof() at iflib_rxeof+0xbcb/frame 0xfffffe00107dbe00 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00107dbe40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00107dbec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00107dbef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe00107dbf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00107dbf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic ---<>--- `

Expected behavior

No kernel panic

Describe alternatives you considered

I'll have the FW powered of later today to exclude any potential issues caused by an unforeseen power surge and I'll update the thread if the panics stop.

Checked temps - avg is around 50C and smart data passes.

Screenshots

n/a

Additional context

This FW uses HE for IPv6 due to the ISP not offering any type of IPv6. This configuration been in place or years, migrated last September from an APU4 to Protectli.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 23.7.10_22 (amd64). FreeBSD 13.2-RELEASE-p9 stable/24.1-n254949-f4c55a65b83 SMP amd64 Intel® J3710 Quad Core Network Intel® Ethernet Controller I225-V igc0-3

ronin3510 commented 7 months ago

The FW has been stable once I've disabled the IPv6 interface.

Noticed the new R1 kernel and I've re-enabled IPv6, keeping an eye on it or now.

FreeBSD OPNsense.localdomain 13.2-RELEASE-p9 FreeBSD 13.2-RELEASE-p9 stable/24.1-n254953-a50a83acb64 SMP amd64

ronin3510 commented 7 months ago

IPv6-Kernel_Panic.txt

OK that didn't take long, the machine is crashing again after reactivating IPv6.

fichtner commented 7 months ago

let me build a deubg kernel for you :)

fichtner commented 7 months ago

Can you try this kernel and let me have the dump file?

# opnsense-update -zkr dbg-24.1.r1

Cheers, Franco

ronin3510 commented 7 months ago

Just noticed the kernel and was wondering who's it for :)

Installed now and IPv6 re-enabled, waiting for the crash

ronin3510 commented 7 months ago

Got these files yesterday but no forced reboot yet.

Had to split the archive so I created 4 7zip files and appended the .dmp suffix as it was allowed for the upload

-rw-r--r-- 1 root wheel 3B Jan 15 14:03 bounds -rw------- 1 root wheel 424B Jan 15 14:03 info.72 -rw------- 1 root wheel 845M Jan 15 14:03 vmcore.72 lrwxr-xr-x 1 root wheel 7B Jan 15 14:03 info.last -> info.72 lrwxr-xr-x 1 root wheel 9B Jan 15 14:03 vmcore.last -> vmcore.72 -r-xr-xr-x 1 root wheel 100M Jan 15 14:03 kernel.72 lrwxr-xr-x 1 root wheel 9B Jan 15 14:03 kernel.last -> kernel.72

info.72.dmp

bounds.dmp

Jan15.7z.004.dmp

Jan15.7z.003.dmp

Jan15.7z.002.dmp

Jan15.7z.001.dmp

fichtner commented 7 months ago

This is pretty much related, but I'm trying to find the fix oO

https://github.com/opnsense/src/commit/87b8226c7bb987

fichtner commented 7 months ago

I see this was supposed to be fixed in 05331e07 which was never fixed by upstream in stable/13, but now something changed due to backporting for 24.1 ?!

fichtner commented 7 months ago

This is a bit annoying... another unrelated issue due to invariants set... gif_transmit() moves a packet from IPv6 to IPv4... is this a IPv4 in IPv6 tunnel?

Unread portion of the kernel message buffer:
panic: ip_set_fwdtag: !AF_INET
cpuid = 2
time = 1705320152
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00107e0350
vpanic() at vpanic+0x151/frame 0xfffffe00107e03a0
panic() at panic+0x43/frame 0xfffffe00107e0400
ip_set_fwdtag() at ip_set_fwdtag+0x11a/frame 0xfffffe00107e0430
pf_route_shared() at pf_route_shared+0x588/frame 0xfffffe00107e04a0
pf_test() at pf_test+0x101e/frame 0xfffffe00107e0630
pf_check_out() at pf_check_out+0x1f/frame 0xfffffe00107e0650
pfil_run_hooks() at pfil_run_hooks+0xb7/frame 0xfffffe00107e0690
ip_output() at ip_output+0xaf2/frame 0xfffffe00107e07b0
gif_transmit() at gif_transmit+0x2e0/frame 0xfffffe00107e07f0
ip6_tryforward() at ip6_tryforward+0x502/frame 0xfffffe00107e0870
ip6_input() at ip6_input+0x7cc/frame 0xfffffe00107e0950
netisr_dispatch_src() at netisr_dispatch_src+0x243/frame 0xfffffe00107e09b0
ether_demux() at ether_demux+0x17a/frame 0xfffffe00107e09e0
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x93/frame 0xfffffe00107e0a00
ng_apply_item() at ng_apply_item+0x166/frame 0xfffffe00107e0a90
ng_snd_item() at ng_snd_item+0x2e1/frame 0xfffffe00107e0ad0
ng_apply_item() at ng_apply_item+0x166/frame 0xfffffe00107e0b60
ng_snd_item() at ng_snd_item+0x2e1/frame 0xfffffe00107e0ba0
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe00107e0bd0
ether_nh_input() at ether_nh_input+0x254/frame 0xfffffe00107e0c30
netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe00107e0c90
ether_input() at ether_input+0x99/frame 0xfffffe00107e0cf0
iflib_rxeof() at iflib_rxeof+0xdf1/frame 0xfffffe00107e0e00
_task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe00107e0e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 0xfffffe00107e0ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe00107e0ef0
fork_exit() at fork_exit+0x80/frame 0xfffffe00107e0f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00107e0f30
ronin3510 commented 7 months ago

23.7.11 kernel is -p7 and 24.1 is on -p9, so I was wondering if you need more time to test the changes in -p9.

Yes this happens on the site where's no IPv6 from the ISP, so Hurricane Electric has been running there for over a year now, Where there's DHCPv6 native everything is good on 24.1, no crashes yet.

In an case I'm glad I caught this prior to GA :)

fichtner commented 7 months ago

I'm not sure how relevant any of this is.. p8 and p9 are just SA-23:18.nfsclient and SA-23:19.openssh and both do not apply to us directly. OTOH there are about 70 handpicked network stack changes... one of them is causing the original report. About the current one in the debug kernel I'm not sure if this is new or hidden for years. gif_transmit() is doing the wrong thing somehow.

ronin3510 commented 7 months ago

Agreed, if there are no other changes p8,9 should not be causing this

So this is on Hurricane Electric only for now as mentioned above when I was editing the comment.

fichtner commented 7 months ago

I'm not sure what to think here... the vmcore.72 complains about a state that is impossible with dst->sin_family != 2 while the frame above says that's what it's passing:

(kgdb) frame 15
#15 0xffffffff823a1b38 in pf_route_shared (m=m@entry=0xfffffe00107e0748, r=0xfffff8004241a800, dir=dir@entry=2, ifp=0xfffff8000302c800, s=<optimized out>, 
    pd=pd@entry=0xfffffe00107e0508, inp=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:6892
6892                    }
(kgdb) p dst
$1 = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {s_addr = 27447890}, sin_zero = "\000\000\000\000\000\000\000"}

And on the next frame:

#14 0xffffffff80e4bfaa in ip_set_fwdtag (m=<optimized out>, dst=<optimized out>, ifp=<optimized out>) at /usr/src/sys/netinet/ip_output.c:1637
1637        KASSERT(dst != NULL, ("%s: !dst", __func__));
(kgdb) list
1632    ip_set_fwdtag(struct mbuf *m, struct sockaddr_in *dst, struct ifnet *ifp)
1633    {
1634        struct ip_fwdtag *fwd_info;
1635        struct m_tag *fwd_tag;
1636    
1637        KASSERT(dst != NULL, ("%s: !dst", __func__));
1638        KASSERT(dst->sin_family == AF_INET, ("%s: !AF_INET", __func__));
1639    
1640        fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL);
1641        if (fwd_tag != NULL) {
(kgdb) p *dst
value has been optimized out

A little inconvenient the value it complains about is not available but looking at it there is no reason why it would be wrong there. 2 is AF_INET.

Suffice to say the situation is probably created due to using an IPv6 gateway redirect rule. Turning shared forwarding off might make it behave more, but I'm entirely unsure if this is new or circumstantial.

fichtner commented 7 months ago

Is this the igb adapter one? There are a couple of e1000 changes that could be the source of this.. pf and network stack are not the apparent issue here.

ronin3510 commented 7 months ago

Nope all igc interfaces

pci4: <PCI bus> on pcib4 igc3: <Intel(R) Ethernet Controller I225-V> mem 0x91600000-0x916fffff,0x91700000-0x91703fff at device 0.0 on pci4 igc3: Using 1024 TX descriptors and 1024 RX descriptors igc3: Using 4 RX queues 4 TX queues igc3: Using MSI-X interrupts with 5 vectors igc3: Ethernet address: 64:62:66:

ronin3510 commented 7 months ago

And it crashed again multiple times on the RC1, trying to ind out what happened in the past few hours

-rw-r--r-- 1 root wheel 5B Jan 17 10:33 minfree -rw------- 1 root wheel 393B Jan 17 13:51 info.73 -rw------- 1 root wheel 153K Jan 17 13:51 textdump.tar.73 -rw------- 1 root wheel 406B Jan 17 16:34 info.74 -rw------- 1 root wheel 153K Jan 17 16:34 textdump.tar.74 -rw------- 1 root wheel 405B Jan 17 16:45 info.75 -rw------- 1 root wheel 153K Jan 17 16:45 textdump.tar.75 -rw------- 1 root wheel 406B Jan 17 17:05 info.76 -rw------- 1 root wheel 153K Jan 17 17:05 textdump.tar.76 -rw------- 1 root wheel 406B Jan 17 17:15 info.77 -rw------- 1 root wheel 153K Jan 17 17:15 textdump.tar.77 -rw------- 1 root wheel 405B Jan 17 17:17 info.78 -rw------- 1 root wheel 153K Jan 17 17:17 textdump.tar.78 -rw-r--r-- 1 root wheel 3B Jan 17 17:49 bounds -rw------- 1 root wheel 406B Jan 17 17:49 info.79 -rw------- 1 root wheel 153K Jan 17 17:49 textdump.tar.79 lrwxr-xr-x 1 root wheel 7B Jan 17 17:49 info.last -> info.79 lrwxr-xr-x 1 root wheel 15B Jan 17 17:49 textdump.tar.last -> textdump.tar.79 lrwxr-xr-x 1 root wheel 9B Jan 17 17:49 kernel.last -> kernel.72

dmesg.txt

fichtner commented 3 months ago

I'm gonna close this. I'm unsure if this applies and no more feedback.