Closed mcbridematt closed 11 months ago
@mcbridematt Could you try to reproduce it with 34014de9125ef96d4d595f944aa3af441de8537d, for example? And with both GENERIC and GENERIC-NODEBUG kernel configurations?
Hit the same(?) problem, but this time in the tx path:
panic: dpaa2_ni_tx_conf: unexpected frame buffer: fd_addr(0x93a5e000) != txb_paddr(0x8cf27000)
cpuid = 5
time = 1656151823
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
dpaa2_ni_tx_conf() at dpaa2_ni_tx_conf+0x138
dpaa2_ni_poll_task() at dpaa2_ni_poll_task+0x160
taskqueue_run_locked() at taskqueue_run_locked+0x17c
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
@mcbridematt Could you test with 1a7aba9f89185b0533b46ecb641560a6be3cb614?
Unfortunately it still happens, I saw both the RX and TX assertions triggered testing today.
@mcbridematt Could you try e95fb522c4675fb0f6855d0942d3e3e63ca6d3f4? I've simplified software portals locking mechanism there and tested with several task threads to poll frames in dpaa2_ni_poll_task().
Looking good so far, no panic and no warnings/errors in dmesg when testing 4 ports and debug kernel over 9 hours. I will try NODEBUG next.
@mcbridematt btw, I noticed that you were using a network interface with several Rx queues/channels (custom DPL?). Could you try it as well?
@dsalychev It looks like NODEBUG is working fine as well :)
I'm pretty sure the multiple Rx queues is from the new DPL which has been default since Ten64 firmware v0.8.10, it was part of the method suggested to me by NXP that allows all 10 ports to balance traffic across all CPUs https://forum.traverse.com.au/t/more-details-on-interrupt-balancing-dpaa2-config-dpio-splitting/114/4?u=mcbridematt
To be honest I haven't checked if Linux takes advantage of all Rx queues but I might go and check..
@mcbridematt I recently started using multiple threads to receive frames: https://github.com/mcusim/freebsd-src/blob/lx2160acex7-dev/sys/dev/dpaa2/dpaa2_ni.c#L646-L648 That's why I'm interested :) I'll check my Ten64 firmware and try to stress my Ten64, thanks for info!
I haven't noticed this panic on https://github.com/mcusim/freebsd-src/tree/ten64. @mcbridematt Could you confirm after your test?
@mcbridematt Could you conduct the same stress test again on https://github.com/mcusim/freebsd-src/tree/dpaa2 ? The ten64 branch is stale now and almost all of the changes have found their way into the dpaa2 one.
Sorry, I should have closed this issue long ago. But it has definitely not reappeared in the latest code.
Commit: 173aa2a1fe
I ran into this twice when running my stresstest for long periods of time (>1 hour)
First crash trace;
Second time: