Closed snail59 closed 1 year ago
Thanks for reporting. It looks like #19 in terms of a root cause, at least. Please, try https://github.com/mcusim/freebsd-src/issues/19#issuecomment-1555888989. Panics won't go away entirely, but you shouldn't see crashes so often. In the meantime, I'm trying to solve the root cause itself.
OK Dmitry. I had seen this bug but thought it was another problem. Sorry.
For now, I am going to try and will let you know.
I reverted 718bdb6 and it has been working waaaay better since. So you are certainly right, the problem is the same as the other issue.
Do you want me to close this issue ?
I humbly ask you what you think about reverting the commit in the source code for now, as it would prevent other people to reach this problem. This is up to you of course.
Do you want me to close this issue ?
I'll close it as a duplicate of #19.
I humbly ask you what you think about reverting the commit in the source code for now, as it would prevent other people to reach this problem. This is up to you of course.
It's important to unmask the panic with https://github.com/mcusim/freebsd-src/commit/718bdb6a71ba4ed1f557f89af1482a10f7b1cb74 because it'll help me to verify the root cause solved with upcoming patches.
@snail59 Please, try https://github.com/mcusim/freebsd-src/tree/dpaa2. GENERIC kernel had worked for me for ~14 hours under high network load till the moment I stopped the test myself.
details: https://github.com/mcusim/freebsd-src/issues/19#issuecomment-1651444388
@dsalychev I just saw your email. I will test and let you know
So, I tried it and quickly got a kernel panic:
Fatal data abort:
x0: 0xffffa0000da24000
x1: 0xffffa00014b58600
x2: 0x000000000000000e
x3: 0xffff0000f853d598 (_DYNAMIC + 0xf6b1e5e0)
x4: 0xffff0000f853d3c6 (_DYNAMIC + 0xf6b1e40e)
x5: 0xffffa00014b58670
x6: 0x0a000cfea4bfa322
x7: 0x0008c12724fa0a00
x8: 0xffffa0000da24000
x9: 0x0000000000000000
x10: 0x000000000000004a
x11: 0xffffa00014b58660
x12: 0x0000000000000000
x13: 0x0000000000000000
x14: 0xffff0000f853d358 (_DYNAMIC + 0xf6b1e3a0)
x15: 0x000000004f3c790b
x16: 0x0000000000000008
x17: 0xffffa0001406eae7
x18: 0xffff0000f853d340 (_DYNAMIC + 0xf6b1e388)
x19: 0xffffa00014b58600
x20: 0xffffa0000da24000
x21: 0x0000000000000000
x22: 0xffff0000f853d578 (_DYNAMIC + 0xf6b1e5c0)
x23: 0x000000000000000e
x24: 0xffff0000f853d3b8 (_DYNAMIC + 0xf6b1e400)
x25: 0x000000000000000e
x26: 0x0000000000000008
x27: 0x0000000000000000
x28: 0x000000003300a8c0
x29: 0xffff0000f853d340 (_DYNAMIC + 0xf6b1e388)
sp: 0xffff0000f853d340
lr: 0xffff000000929020 (dpaa2_ni_transmit + 0x38)
elr: 0xffff0000009290a0 (dpaa2_ni_transmit + 0xb8)
spsr: 0x0000000040000045
far: 0x0000000000002ee8
esr: 0x0000000096000004
panic: vm_fault failed: 0xffff0000009290a0 error 1
cpuid = 5
time = 1690627307
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
data_abort() at data_abort+0x308
handle_el1h_sync() at handle_el1h_sync+0x14
--- exception, esr 0x96000004
dpaa2_ni_transmit() at dpaa2_ni_transmit+0xb8
ether_output_frame() at ether_output_frame+0xd0
ether_output() at ether_output+0x664
ip_output_send() at ip_output_send+0xe8
ip_output() at ip_output+0x1394
ip_forward() at ip_forward+0x474
ip_input() at ip_input+0x924
netisr_dispatch_src() at netisr_dispatch_src+0xf0
ether_demux() at ether_demux+0x158
ether_nh_input() at ether_nh_input+0x39c
netisr_dispatch_src() at netisr_dispatch_src+0xf0
ether_input() at ether_input+0x48
uether_rxflush() at uether_rxflush+0x98
cdce_ncm_bulk_read_callback() at cdce_ncm_bulk_read_callback+0xb0
usbd_callback_wrapper() at usbd_callback_wrapper+0x6cc
usb_command_wrapper() at usb_command_wrapper+0x84
usb_callback_proc() at usb_callback_proc+0x16c
usb_process() at usb_process+0x124
fork_exit() at fork_exit+0x88
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 15 tid 100103 ]
Stopped at kdb_enter+0x44: str xzr, [x19, #1152]
I tried rebuilding everything once again and booting without the USB modem but it still panics
Fatal data abort:
x0: 0x0000000000000000
x1: 0xffffa00014bf0600
x2: 0x000000000000000e
x3: 0xffff000102640828
x4: 0xffff0001026406f6
x5: 0xffffa00014bf0670
x6: 0x0a000cfea4bfa322
x7: 0x0008c12724fa0a00
x8: 0x0000000000000000
x9: 0x0000000000000000
x10: 0x0000000000000036
x11: 0xffffa00014bf0660
x12: 0x0000000000000008
x13: 0xffffa00014e217e0
x14: 0x0000000700000000
x15: 0x0000000000000039
x16: 0xffff00010264075f
x17: 0xffffa00014e18267
x18: 0xffff000102640670
x19: 0xffffa00014bf0600
x20: 0x0000000000000000
x21: 0xffffa0000791a800
x22: 0xffff000102640808
x23: 0x000000000000000e
x24: 0xffff0001026406e8
x25: 0x000000000000000e
x26: 0x0000000000000008
x27: 0x0000000000000000
x28: 0x000000003300a8c0
x29: 0xffff000102640670
sp: 0xffff000102640670
lr: 0xffff000000929020 (dpaa2_ni_transmit + 0x38)
elr: 0xffff00000092909c (dpaa2_ni_transmit + 0xb4)
spsr: 0x0000000040000045
far: 0x00000000000002b8
esr: 0x0000000096000004
panic: vm_fault failed: 0xffff00000092909c error 1
cpuid = 0
time = 21
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
data_abort() at data_abort+0x308
handle_el1h_sync() at handle_el1h_sync+0x14
--- exception, esr 0x96000004
dpaa2_ni_transmit() at dpaa2_ni_transmit+0xb4
ether_output_frame() at ether_output_frame+0xd0
ether_output() at ether_output+0x664
ip_output_send() at ip_output_send+0xe8
ip_output() at ip_output+0x1394
pf_intr() at pf_intr+0x240
ithread_loop() at ithread_loop+0x3fc
fork_exit() at fork_exit+0x88
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100268 ]
Stopped at kdb_enter+0x44: str xzr, [x19, #1152]
@dsalychev out of curiosity, does it make sense? Is there something obvious ? Do you need more information ?
It definitely does. Could you try a85d6c9ad5fe4de8cb3bc651253a1717fb28505c?
I've probably made a mistake with:
static int
dpaa2_ni_transmit(if_t ifp, struct mbuf *m)
{
...
/* Transmit mbuf on the same interface it was received from */
if (m->m_pkthdr.rcvif != NULL) {
sc = if_getsoftc(m->m_pkthdr.rcvif);
}
...
Just tried it. So far so good, it did not panic on boot and has been running fine for 1 hour.
@snail59 sounds good :) Please, keep it loaded for some time and try your original scenario when the kernel panicked.
So, it has been running for some time... And I had no problem at all. I could do way more than my original scenario. I did not run tests before/after so I can not compare performances.
Good work mate :-)
I could do way more than my original scenario
It'd be really good :) I hope I'll be able to commit those changes till 14.0. Thanks for testing!
I hope too ! Otherwise (unless you revert your last commit), FreeBSD 14.0 won't install on ten64 :-/
For your information (I know this is not your code's fault), you rebased your branch on main while there currently is a problem preventing the compilation.
The message is ld: error: /usr/obj/traverse/sources/git/usr/src/arm64.aarch64/tmp/usr/lib/libcompiler_rt.a(absvdi2.o) is incompatible with /usr/obj/traverse/sources/git/usr/src/arm64.aarch64/tmp/usr/lib32/crti.o
I could not figure out yet which commit is faulty. Neither what is happening exactly as I am not a developer :-D
It looks like you've to re-compile the world as well. This worked for me:
$ nice make -s -j30 tinderbox TARGETS="arm64"
It looks like you've to re-compile the world as well. This worked for me:
$ nice make -s -j30 tinderbox TARGETS="arm64"
For information (because this has nothing to do with the current issue) , I dug and it appears that it fails when using ccache and succeeds without. I suspect this is so since the LIB32 option was activated on aarch64. Plus, some binaries of the build machine are copied to a legacy/bin subdirectory of $MAKEOBJDIRPREFIX. Then they are used at installation time, so this breaks installation if the build was cross compiled !
Hello,
Recently, my ten64 started crashing multiple times per day.
dpni9 and dpni8 are 10Gb nics; in my case, dpni9 is facing internet while dpni8 serves my local network. Plus, I use vlans, mostly for vmware virtual machines.
Communication between internet and my local network works pretty fine, but
I tried different scenarios:
1) dpni8 form my local network + vlans As soon as VM ( so in a vlan) begins to communicate with any other network, the ten64 crashes:
2) dpni 8 form local network and dpni5 for vlans This is a less unstable configuration, but it eventually crashes anyway:
3) using 13.2 It quickly crashes, even without activity
Tell me if you need more information or tests from me. My build machine is fast so it does not bother me to compile multiple times.