mcusim / freebsd-src

sys/dev/dpaa2 drivers work-in-progress
https://www.FreeBSD.org/
Other
4 stars 3 forks source link

dpaa2_ni_rx panic: dpaa2_ni_rx: unexpected frame buffer fd_addr != buf_paddr #8

Closed mcbridematt closed 11 months ago

mcbridematt commented 2 years ago

Commit: 173aa2a1fe

I ran into this twice when running my stresstest for long periods of time (>1 hour)

panic: dpaa2_ni_rx: unexpected frame buffer: fd_addr(0x305800008c900000) != buf_paddr(0x3058000088ccf000)
cpuid = 5
time = 1652662301
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
kdb_backtrace() at kdb_backtrace+0x38
vpanic() at vpanic+0x17c
panic() at panic+0x44
dpaa2_ni_rx() at dpaa2_ni_rx+0x26c
dpaa2_ni_poll_task() at dpaa2_ni_poll_task+0x1b0
taskqueue_run_locked() at taskqueue_run_locked+0xac
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 0 tid 100118 ]
Stopped at      kdb_enter+0x40: undefined       f902027f

First crash trace;

(kgdb) frame 3
#3  0xffff0000007d3394 in dpaa2_ni_rx (chan=0xffff0000fd6f8000, fq=<optimized out>, fd=0xffff0000fda42020) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2630
2630            KASSERT(paddr == buf->paddr, ("%s: unexpected frame buffer: "
(kgdb) info locals
released = {0, 0, 0, 8589934592, 18446462598741807642, 18446462602928396336, 18446462598741044400}
ifp = <optimized out>
sc = <optimized out>
paddr = 3483534314129326080
released_n = 0
buf = <optimized out>
buf_chan = 0xec36f06f7149058a
buf_idx = <optimized out>
m = <optimized out>
buf_len = <optimized out>
buf_data = <optimized out>
error = <optimized out>
bp_dev = <optimized out>
bpsc = <optimized out>
chan_idx = <optimized out>
(kgdb) frame 4
#4  0xffff0000007d2da8 in dpaa2_ni_consume_frames (chan=0xffff0000fd6f8000, src=<optimized out>, consumed=<optimized out>) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2568
2568                                    fq->consume(chan, fq, fd);
(kgdb) info locals
retries = <optimized out>
fq = 0x80
rc = 36
dq = 0xffff0000fda42000
fd = 0xffff0000008c5b8a
frames = <optimized out>
(kgdb) print *dq
$4 = {{common = {verb = 96 '`', _reserved = "\223\000\000\000\000\000̖", '\000' <repeats 16 times>, "\321s\375\000\000\377\377\000P\347\070\202\000\024\000\352\005\000\000\000\000\300\000\000\200\000 \000\000\200\a\000\000\000\000\000\000\000"}, fdr = {desc = {
        verb = 96 '`', stat = 147 '\223', seqnum = 0, oprid = 0, _reserved = 0 '\000', tok = 204 '\314', fqid = 150, _reserved1 = 0, fq_byte_cnt = 0, fq_frm_cnt = 0, fqd_ctx = 18446462602985066752}, fd = {addr = 5630058834644992, data_length = 1514, bpid_ivp_bmt = 0,
        offset_fmt_sl = 192, frame_ctx = 536903680, ctrl = 125829120, flow_ctx = 0}}, scn = {verb = 96 '`', stat = 147 '\223', state = 0 '\000', _reserved = 0 '\000', rid_tok = 3422552064, ctx = 150}}}
(kgdb) print *fd
$5 = {addr = 7165916604720706863, data_length = 1701996079, bpid_ivp_bmt = 25189, offset_fmt_sl = 25715, frame_ctx = 1668444973, ctrl = 1937339183, flow_ctx = 7307986971750918959}
(kgdb) print *fq
Cannot access memory at address 0x80
(kgdb) print fd
$6 = (struct dpaa2_fd *) 0

Second time:

#4  0xffff0000007d2da8 in dpaa2_ni_consume_frames (chan=0xffff0000fc616000, src=<optimized out>, consumed=<optimized out>) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2568
2568                                    fq->consume(chan, fq, fd);
(kgdb) info locals
retries = <optimized out>
fq = 0x80
rc = 36
dq = 0xffff0000fcc58000
fd = 0xffff0000008c5b8a
frames = <optimized out>
(kgdb) print *dq
$1 = {{common = {verb = 96 '`', _reserved = "\022\000\000\000\000\000̵", '\000' <repeats 16 times>, "\215a\374\000\000\377\377\000\000=\214\000\000\270qB\000\000\000\000\000\300@\000\240\000 \000\000\001\000\000\000\000\000\000\000\000"}, fdr = {desc = {verb = 96 '`',
        stat = 18 '\022', seqnum = 0, oprid = 0, _reserved = 0 '\000', tok = 204 '\314', fqid = 181, _reserved1 = 0, fq_byte_cnt = 0, fq_frm_cnt = 0, fqd_ctx = 18446462602967092480}, fd = {addr = 8194299524353425408, data_length = 66, bpid_ivp_bmt = 0,
        offset_fmt_sl = 16576, frame_ctx = 536911872, ctrl = 65536, flow_ctx = 0}}, scn = {verb = 96 '`', stat = 18 '\022', state = 0 '\000', _reserved = 0 '\000', rid_tok = 3422552064, ctx = 181}}}
(kgdb) print *fd
$2 = {addr = 7165916604720706863, data_length = 1701996079, bpid_ivp_bmt = 25189, offset_fmt_sl = 25715, frame_ctx = 1668444973, ctrl = 1937339183, flow_ctx = 7307986971750918959}
dsalychev commented 2 years ago

@mcbridematt Could you try to reproduce it with 34014de9125ef96d4d595f944aa3af441de8537d, for example? And with both GENERIC and GENERIC-NODEBUG kernel configurations?

mcbridematt commented 2 years ago

Hit the same(?) problem, but this time in the tx path:

panic: dpaa2_ni_tx_conf: unexpected frame buffer: fd_addr(0x93a5e000) != txb_paddr(0x8cf27000)
cpuid = 5
time = 1656151823
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
dpaa2_ni_tx_conf() at dpaa2_ni_tx_conf+0x138
dpaa2_ni_poll_task() at dpaa2_ni_poll_task+0x160
taskqueue_run_locked() at taskqueue_run_locked+0x17c
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
dsalychev commented 2 years ago

@mcbridematt Could you test with 1a7aba9f89185b0533b46ecb641560a6be3cb614?

mcbridematt commented 2 years ago

Unfortunately it still happens, I saw both the RX and TX assertions triggered testing today.

dsalychev commented 1 year ago

@mcbridematt Could you try e95fb522c4675fb0f6855d0942d3e3e63ca6d3f4? I've simplified software portals locking mechanism there and tested with several task threads to poll frames in dpaa2_ni_poll_task().

mcbridematt commented 1 year ago

Looking good so far, no panic and no warnings/errors in dmesg when testing 4 ports and debug kernel over 9 hours. I will try NODEBUG next.

dsalychev commented 1 year ago

@mcbridematt btw, I noticed that you were using a network interface with several Rx queues/channels (custom DPL?). Could you try it as well?

mcbridematt commented 1 year ago

@dsalychev It looks like NODEBUG is working fine as well :)

I'm pretty sure the multiple Rx queues is from the new DPL which has been default since Ten64 firmware v0.8.10, it was part of the method suggested to me by NXP that allows all 10 ports to balance traffic across all CPUs https://forum.traverse.com.au/t/more-details-on-interrupt-balancing-dpaa2-config-dpio-splitting/114/4?u=mcbridematt

To be honest I haven't checked if Linux takes advantage of all Rx queues but I might go and check..

dsalychev commented 1 year ago

@mcbridematt I recently started using multiple threads to receive frames: https://github.com/mcusim/freebsd-src/blob/lx2160acex7-dev/sys/dev/dpaa2/dpaa2_ni.c#L646-L648 That's why I'm interested :) I'll check my Ten64 firmware and try to stress my Ten64, thanks for info!

dsalychev commented 1 year ago

I haven't noticed this panic on https://github.com/mcusim/freebsd-src/tree/ten64. @mcbridematt Could you confirm after your test?

dsalychev commented 12 months ago

@mcbridematt Could you conduct the same stress test again on https://github.com/mcusim/freebsd-src/tree/dpaa2 ? The ten64 branch is stale now and almost all of the changes have found their way into the dpaa2 one.

mcbridematt commented 11 months ago

Sorry, I should have closed this issue long ago. But it has definitely not reappeared in the latest code.