mcusim / freebsd-src

sys/dev/dpaa2 drivers work-in-progress
https://www.FreeBSD.org/
Other
4 stars 3 forks source link

"Failed to pull frames" when using multiple DPNIs #1

Closed mcbridematt closed 11 months ago

mcbridematt commented 2 years ago

Hardware: Ten64 MC firmware: 10.20 Commit: 6efa7d1a0320f7716b12e5e5b0592f0e04dcee80

When more than one interface / DPNI is transferring data, the following errors appear in the system console / dmesg:

dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16

An example use case is when the system is being used as a router between two network interfaces. I don't see any evidence of packet loss which is good.

This message is printed by dpaa2_ni_poll_task around line 2367: https://github.com/mcusim/freebsd-src/blob/6efa7d1a0320f7716b12e5e5b0592f0e04dcee80/sys/dev/dpaa2/dpaa2_ni.c#L2364-L2370

dsalychev commented 2 years ago

It seems those errors appear when different DPNIs use the same DPIO (struct dpaa2_swp). Driver keeps the software portal busy executing a Volatile Dequeue command for too long, i.e.

            /* Make VDQ command available again. */
            atomic_xchg(&swp->vdq.avail, 1);

is set too late, I think. EDIT: It's my guess though. I'll check it and prepare a patch.

dsalychev commented 2 years ago

@mcbridematt Could you try the latest commit? I don't have this error reported starting from efe105cc1332c96aa48088371559e2b2c8b7c71c.

mcbridematt commented 2 years ago

@dsalychev Yes, no more errors after updating the kernel. I'll move some devices behind this machine and see how it goes.

mcbridematt commented 2 years ago

FYI, for a system that did 4.9TB of traffic over 14 hours I still got a few warnings in dmesg:

dmesg | grep 'failed to pull frames'  | wc -l
     590

590 / 5TB is a very small rate, but I don't know enough to judge how important the warning message is.

dsalychev commented 2 years ago

Could you show an output of sysctl dev.dpaa2_ni.0 (for dpni0) for all of the interfaces reported the errors? I'm particularly interested in

dev.dpaa2_ni.0.stats.in_discarded_frames: 18
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0
mcbridematt commented 2 years ago

After a 1hr iperf run that logged around 70 failed to pull frames messages, none of the interfaces had discards

dev.dpaa2_ni.1.stats.in_all_frames: 75934381
dev.dpaa2_ni.1.stats.in_all_bytes: 5011725630
dev.dpaa2_ni.1.stats.in_multi_frames: 0
dev.dpaa2_ni.1.stats.eg_all_frames: 157009142
dev.dpaa2_ni.1.stats.eg_all_bytes: 237634857392
dev.dpaa2_ni.1.stats.eg_multi_frames: 0
dev.dpaa2_ni.1.stats.in_filtered_frames: 0
dev.dpaa2_ni.1.stats.in_discarded_frames: 0
dev.dpaa2_ni.1.stats.in_nobuf_discards: 0
dev.dpaa2_ni.1.stats.tx_sg_frames: 157009292
dev.dpaa2_ni.1.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.1.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.1.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_single_buf_frames: 75934511
dev.dpaa2_ni.1.stats.rx_anomaly_frames: 0
dev.dpaa2_ni.1.channels.7.tx_dropped: 0
dev.dpaa2_ni.1.channels.7.tx_frames: 0
dev.dpaa2_ni.1.channels.6.tx_dropped: 0
dev.dpaa2_ni.1.channels.6.tx_frames: 0
dev.dpaa2_ni.1.channels.5.tx_dropped: 0
dev.dpaa2_ni.1.channels.5.tx_frames: 0
dev.dpaa2_ni.1.channels.4.tx_dropped: 0
dev.dpaa2_ni.1.channels.4.tx_frames: 0
dev.dpaa2_ni.1.channels.3.tx_dropped: 0
dev.dpaa2_ni.1.channels.3.tx_frames: 0
dev.dpaa2_ni.1.channels.2.tx_dropped: 0
dev.dpaa2_ni.1.channels.2.tx_frames: 0
dev.dpaa2_ni.1.channels.1.tx_dropped: 0
dev.dpaa2_ni.1.channels.1.tx_frames: 0
dev.dpaa2_ni.1.channels.0.tx_dropped: 0
dev.dpaa2_ni.1.channels.0.tx_frames: 157009437
dev.dpaa2_ni.1.%parent: dpaa2_rc0
dev.dpaa2_ni.1.%pnpinfo:
dev.dpaa2_ni.1.%location:
dev.dpaa2_ni.1.%driver: dpaa2_ni
dev.dpaa2_ni.1.%desc: DPAA2 Network Interface
dev.dpaa2_ni.2.stats.in_all_frames: 165393160
dev.dpaa2_ni.2.stats.in_all_bytes: 250312613260
dev.dpaa2_ni.2.stats.in_multi_frames: 0
dev.dpaa2_ni.2.stats.eg_all_frames: 48486702
dev.dpaa2_ni.2.stats.eg_all_bytes: 3200223070
dev.dpaa2_ni.2.stats.eg_multi_frames: 0
dev.dpaa2_ni.2.stats.in_filtered_frames: 0
dev.dpaa2_ni.2.stats.in_discarded_frames: 0
dev.dpaa2_ni.2.stats.in_nobuf_discards: 0
dev.dpaa2_ni.2.stats.tx_sg_frames: 48486702
dev.dpaa2_ni.2.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.2.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.2.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.2.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.2.stats.rx_single_buf_frames: 165392672
dev.dpaa2_ni.2.stats.rx_anomaly_frames: 0
dev.dpaa2_ni.2.channels.7.tx_dropped: 0
dev.dpaa2_ni.2.channels.7.tx_frames: 0
dev.dpaa2_ni.2.channels.6.tx_dropped: 0
dev.dpaa2_ni.2.channels.6.tx_frames: 0
dev.dpaa2_ni.2.channels.5.tx_dropped: 0
dev.dpaa2_ni.2.channels.5.tx_frames: 0
dev.dpaa2_ni.2.channels.4.tx_dropped: 0
dev.dpaa2_ni.2.channels.4.tx_frames: 0
dev.dpaa2_ni.2.channels.3.tx_dropped: 0
dev.dpaa2_ni.2.channels.3.tx_frames: 0
dev.dpaa2_ni.2.channels.2.tx_dropped: 0
dev.dpaa2_ni.2.channels.2.tx_frames: 0
dev.dpaa2_ni.2.channels.1.tx_dropped: 0
dev.dpaa2_ni.2.channels.1.tx_frames: 0
dev.dpaa2_ni.2.channels.0.tx_dropped: 0
dev.dpaa2_ni.2.channels.0.tx_frames: 48486702

(This is with the buffer commits reverted: 19d82451a68fcd710b3c0a03b2da843b8238a407, 846462f2d7ec1ae10fbe90da9278a06103af61a2, 48d302a8d08268380292e73bed34beb9408b2aa4)

dsalychev commented 2 years ago

These are good news. I'll try to prepare a debug code to check whether those frames were processed at all and not dropped silently after an error returned by dpaa2_swp_pull().

dsalychev commented 2 years ago

@mcbridematt Could you test with 1a7aba9f89185b0533b46ecb641560a6be3cb614?

mcbridematt commented 2 years ago

@dsalychev I now see a few 'timeout to consume frames' errors as well, is that expected?

dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: dpaa2_ni_poll_task: timeout to consume frames: chan_id=23
dpaa2_ni1: dpaa2_ni_poll_task: failed to pull frames: chan_id=4, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: dpaa2_ni_poll_task: timeout to consume frames: chan_id=24
dpaa2_ni0: dpaa2_ni_poll_task: timeout to consume frames: chan_id=23
dpaa2_ni1: dpaa2_ni_poll_task: failed to pull frames: chan_id=4, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16

sysctls:

dev.dpaa2_ni.0.stats.in_all_frames: 33739237
dev.dpaa2_ni.0.stats.in_all_bytes: 2227163082
dev.dpaa2_ni.0.stats.in_multi_frames: 0
dev.dpaa2_ni.0.stats.eg_all_frames: 76976026
dev.dpaa2_ni.0.stats.eg_all_bytes: 116515198666
dev.dpaa2_ni.0.stats.eg_multi_frames: 0
dev.dpaa2_ni.0.stats.in_filtered_frames: 0
dev.dpaa2_ni.0.stats.in_discarded_frames: 0
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0
dev.dpaa2_ni.0.stats.buf_free: 0
dev.dpaa2_ni.0.stats.buf_num: 2800
dev.dpaa2_ni.0.stats.tx_sg_frames: 76976026
dev.dpaa2_ni.0.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.0.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.0.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.0.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.0.stats.rx_single_buf_frames: 33739234
dev.dpaa2_ni.0.stats.rx_anomaly_frames: 0
...
dev.dpaa2_ni.0.channels.0.tx_frames: 76976026
dev.dpaa2_ni.1.stats.in_all_frames: 32743170
dev.dpaa2_ni.1.stats.in_all_bytes: 2161390320
dev.dpaa2_ni.1.stats.in_multi_frames: 0
dev.dpaa2_ni.1.stats.eg_all_frames: 75728550
dev.dpaa2_ni.1.stats.eg_all_bytes: 114619322702
dev.dpaa2_ni.1.stats.eg_multi_frames: 0
dev.dpaa2_ni.1.stats.in_filtered_frames: 0
dev.dpaa2_ni.1.stats.in_discarded_frames: 0
dev.dpaa2_ni.1.stats.in_nobuf_discards: 0
dev.dpaa2_ni.1.stats.buf_free: 0
dev.dpaa2_ni.1.stats.buf_num: 2800
dev.dpaa2_ni.1.stats.tx_sg_frames: 75728550
dev.dpaa2_ni.1.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.1.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.1.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_single_buf_frames: 32743170
dev.dpaa2_ni.1.stats.rx_anomaly_frames: 0
...
dev.dpaa2_ni.1.channels.0.tx_frames: 75728550
dsalychev commented 1 year ago

@mcbridematt I've an experimental branch: https://github.com/mcusim/freebsd-src/tree/ten64

Could you try to run a stress test? I've been fighting another panic (Undefined instruction: ..., panic: Unknown kernel exception 0 esr_el1 2000000) and my Ten64 survived the last night under stress test. I wonder whether it helps to solve the issues with frames consuming.

mcbridematt commented 11 months ago

Not seen on commit a85d6c9ad5fe4de8cb3bc651253a1717fb28505c