Open vdh-robothania opened 1 day ago
The issue is not present when using kernel 6.6.31. I haven't tried higher versions as I am experiencing the issue described in #6406
Try after sudo rpi-update
(but back up any important data first).
The issue is still present after doing sudo rpi-update
(it's using hash 73e104f32e69e17450c443ebdb8bc493c849c38d) and rebooting
Thanks - it's probably another aspect of the recent batch of changes that led to neighbour issue #6406.
Do any messages appear in the kernel log at the same time as that delay?
Using sudo dmesg -w
, no messages appear at the same time as the delay
OK, so it's probably not the same as this related problem: https://github.com/raspberrypi/linux/issues/6406#issuecomment-2402149659
Please compile https://github.com/linux-can/can-utils and run sudo mcp251xfd-dump spi0.0
after each cansend
and send me the output. Also include the output of grep spi /proc/interrupts
after each cansend.
Compilation is really easy:
git clone https://github.com/linux-can/can-utils.git
make -C can-utils
sudo make -C can-utils install
Here you go
Have you configured RX or TX IRQ coalescing? Show me the output of ethtool -g can0; ethtool -c can0;
.
Ring parameters for can0:
Pre-set maximums:
RX: 24
RX Mini: n/a
RX Jumbo: n/a
TX: 8
Current hardware settings:
RX: 20
RX Mini: n/a
RX Jumbo: n/a
TX: 4
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
TCP data split: n/a
Coalesce parameters for can0:
Adaptive RX: n/a TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a
rx-usecs: n/a
rx-frames: n/a
rx-usecs-irq: 0
rx-frames-irq: 8
tx-usecs: n/a
tx-frames: n/a
tx-usecs-irq: 0
tx-frames-irq: 2
rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a
rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a
CQE mode RX: n/a TX: n/a
The system behaves as configured. You get an RX-IRQ after 8 RX'ed frames:
rx-frames-irq: 8
You probably want to configure a timeout for the RX- and TX-IRQ, e.g. 10ms: ethtool -C can0 rx-usecs-irq 10000 rx-frames-irq 8 tx-usecs-irq 10000 tx-frames-irq 2
.
Thanks, that seems to be the issue. The values of tx-frames-irq
and rx-frames-irq
is set to 1 at boot, but automatically changes to 2 and 8 when executing sudo ip link set can0 up type can bitrate 1000000 dbitrate 2000000 berr-reporting on fd on
.
I can set them back to 1 by setting the interface down, followed by sudo ethtool -C can0 rx-frames-irq 1 tx-frames-irq 1
and sudo ip link set can0 up type can bitrate 1000000 dbitrate 2000000 berr-reporting on fd on
.
I just find it strange that the values stay as they are the second time I enable the interface, while they change the first time.
There is probably some 3rd program setting these values. You can add the following to the driver to generate a dump when someone sets the coalescing parameters. The stack strace should contain the program who called this.
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ethtool.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ethtool.c
@@ -96,6 +96,8 @@ static int mcp251xfd_ring_set_coalesce(struct net_device *ndev,
};
struct can_ram_layout layout;
+ dump_stack();
+
can_ram_get_layout(&layout, &mcp251xfd_ram_config, &ring, ec, fd_mode);
if ((layout.rx_coalesce != priv->rx_obj_num_coalesce_irq ||
The issue is not present when using kernel 6.6.31
Is that consistent with it being a configuration issue? Do older kernels ignore the configuration?
@pelwell Thanks for pointing this out. The coalescing configuration is supported since v5.18. There were some changes to the coalescing configuration code in the v6.6.x stable tree. I have to take a deeper look.
There is probably some 3rd program setting these values. You can add the following to the driver to generate a dump when someone sets the coalescing parameters. The stack strace should contain the program who called this.
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ethtool.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ethtool.c @@ -96,6 +96,8 @@ static int mcp251xfd_ring_set_coalesce(struct net_device *ndev, }; struct can_ram_layout layout; + dump_stack(); + can_ram_get_layout(&layout, &mcp251xfd_ram_config, &ring, ec, fd_mode); if ((layout.rx_coalesce != priv->rx_obj_num_coalesce_irq ||
I have added the dump_stack()
to the driver as you said and reinstalled the kernel. This time using kernel 6.12, where the issue is still present.
$ uname -a
Linux rp5 6.12.0-rc2-v8-16k+ #2 SMP PREEMPT Wed Oct 9 17:56:11 CEST 2024 aarch64 GNU/Linux
$ sudo ethtool -c can0 | grep frames-irq
rx-frames-irq: 1
tx-frames-irq: 1
$ sudo ip link set can0 up type can bitrate 1000000 dbitrate 2000000 berr-reporting on fd on
$ sudo ethtool -c can0 | grep frames-irq
rx-frames-irq: 8
tx-frames-irq: 2
$ sudo ip link set can0 down
$ sudo ethtool -C can0 rx-frames-irq 1 tx-frames-irq 1
After executing the ethtool
command I get the following output in dmesg
:
[ 330.983658] CPU: 1 UID: 0 PID: 2186 Comm: ethtool Tainted: G C 6.12.0-rc2-v8-16k+ #2
[ 330.983671] Tainted: [C]=CRAP
[ 330.983673] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 330.983675] Call trace:
[ 330.983677] dump_backtrace.part.0+0xe0/0x100
[ 330.983686] show_stack+0x20/0x40
[ 330.983689] dump_stack_lvl+0x60/0x80
[ 330.983696] dump_stack+0x18/0x28
[ 330.983700] mcp251xfd_ring_set_coalesce+0x7c/0x17a8 [mcp251xfd]
[ 330.983712] __ethnl_set_coalesce.isra.0+0x490/0x568
[ 330.983718] ethnl_set_coalesce+0x44/0xb8
[ 330.983721] ethnl_default_set_doit+0xc8/0x1b0
[ 330.983726] genl_family_rcv_msg_doit+0xe0/0x160
[ 330.983732] genl_rcv_msg+0x228/0x2b8
[ 330.983735] netlink_rcv_skb+0x68/0x140
[ 330.983740] genl_rcv+0x40/0x60
[ 330.983746] netlink_unicast+0x320/0x390
[ 330.983751] netlink_sendmsg+0x198/0x400
[ 330.983756] __sock_sendmsg+0x64/0xc0
[ 330.983762] __sys_sendto+0x10c/0x178
[ 330.983767] __arm64_sys_sendto+0x30/0x48
[ 330.983771] invoke_syscall+0x50/0x120
[ 330.983777] el0_svc_common.constprop.0+0x48/0xf0
[ 330.983782] do_el0_svc+0x24/0x38
[ 330.983787] el0_svc+0x34/0xe0
[ 330.983792] el0t_64_sync_handler+0x120/0x138
[ 330.983797] el0t_64_sync+0x190/0x198
$ sudo ip link set can0 up type can bitrate 1000000 dbitrate 2000000 berr-reporting on fd on
$ sudo ethtool -c can0 | grep frames-irq
rx-frames-irq: 1
tx-frames-irq: 1
So it seems that when calling sudo ip link set can0 up ...
initially the tx-frames-irq
and rx-frames-irq
values get set without calling the mcp251xfd_ring_set_coalesce()
function. Is it expected behavior that calling sudo ip link ...
should set the tx-frames-irq
and rx-frames-irq
values on newer kernels?
Thanks for testing, especially mainline.
If you see a single backtrace in the demsg
output and only after the manual ethtool
then there seems to be a problem with the kernel. I'll look into it, hopefully I find some time on Friday.
Thanks for helping with this, Marc.
Describe the bug
I am using a Waveshare 2-CH CAN FD HAT Rev2.1 with a Raspberry Pi 5 running Raspberry Pi OS. I have connected the CAN hat to a microcontroller that echoes anything back that is sent on the CAN bus.
The issue I am experiencing, is that messages aren't being received by candump (and socket
read()
in C) immediately. They are only received after several minutes or after sending multiple messages repeatedly.I believe the messages might be stuck in a buffer/queue somewhere, as the hardware timestamps (see
candump
output below) indicates that messages are received at the expected time (but still not output until later).Steps to reproduce the behaviour
Add the following to
/boot/firmware/config.txt
:Enable can0 interface:
sudo ip link set can0 up type can bitrate 1000000 dbitrate 2000000 berr-reporting on fd on
Send FDCAN messages using
cansend
:cansend can0 234##31122334455667788
The following is the output of
candump -xHt z can0
when executing cansend 8 times:The device with ID 0x234 is the Raspberry Pi, while the device with ID 0x123 is the microcontroller. I would expect to receive one line with TX followed by one line with RX for each time
cansend
is executed, but this is not case.Device (s)
Raspberry Pi 5
System
Logs
Additional context
No response