Open LucileQ opened 5 years ago
The program is about testing full duplex, with sending and receiving frames interleaved. The program uses sleeps for this. Use cangen
instead.
What part of CAN exactly is full duplex?
Hey @vladimiroltean,
the CAN bus itself is half duplex, only one CAN controller can send at a time. However the driver can pass a CAN frame for sending into the controller at any time (even if the bus is currently used by another controller). The CAN frame is delayed and tried to send when the bus is free automatically by the controller. This means from the drivers perspective and thus from the user space point of view the CAN bus is as it were full duplex.
The mentioned test (canfdtest
) sends and receives CAN frames interleaved with various delays between rx and tx, testing that the controller and driver doesn't reorder frames.
The CAN frame is delayed and tried to send when the bus is free automatically by the controller.
All other half-duplex buses (including Ethernet) work in the same way. That doesn't make them full duplex though.
The mentioned test (canfdtest) sends and receives CAN frames interleaved with various delays between rx and tx, testing that the controller and driver doesn't reorder frames.
By the way, if canfdtest
fails like this (on the flexcan driver), where should I start looking to debug this issue?
interface = can1, family = 29, type = 3, proto = 1
Databyte 0 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 1 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 2 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 3 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 4 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 5 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 6 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Databyte 7 mismatch !
expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52
received: 0078: [8] 4c 4d 4e 4f 50 51 52 53
Test messages sent and received: 3147
When this happens, the driver statistics look to me like they're pretty "clean":
Statistics for run 1365:
9: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 10 prop-seg 42 phase-seg1 32 phase-seg2 25 sjw 1
flexcan: tseg1 2..96 tseg2 2..32 sjw 1..16 brp 1..1024 brp-inc 1
flexcan: dtseg1 2..39 dtseg2 2..8 dsjw 1..4 dbrp 1..1024 dbrp-inc 1
clock 200000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
635354784 79419348 0 0 0 0
TX: bytes packets errors dropped carrier collsns
317677392 39709674 0 0 0 0
10: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 10 prop-seg 42 phase-seg1 32 phase-seg2 25 sjw 1
flexcan: tseg1 2..96 tseg2 2..32 sjw 1..16 brp 1..1024 brp-inc 1
flexcan: dtseg1 2..39 dtseg2 2..8 dsjw 1..4 dbrp 1..1024 dbrp-inc 1
clock 200000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
635354784 79419348 0 0 0 0
TX: bytes packets errors dropped carrier collsns
317677392 39709674 0 0 0 0
The CAN frame is delayed and tried to send when the bus is free automatically by the controller.
All other half-duplex buses (including Ethernet) work in the same way. That doesn't make them full duplex though.
Not quite, modern Ethernet (using twisted pair and switches or direct connections) is usually full duplex. Old Ethernet with (with BNC cables) is half duplex and the CSMA/CD while the CAN bus is CSMA/CA.
By the way, if
canfdtest
fails like this (on the flexcan driver), where should I start looking to debug this issue?
One CAN controller is flexcan, what is the other controller?
interface = can1, family = 29, type = 3, proto = 1 Databyte 0 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 1 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 2 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 3 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 4 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 5 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 6 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Databyte 7 mismatch ! expected: 0078: [8] 4b 4c 4d 4e 4f 50 51 52 received: 0078: [8] 4c 4d 4e 4f 50 51 52 53 Test messages sent and received: 3147
This means one CAN frame was lost or the order was changed. You have to figure out if it happened on the generator or the reflector.
The CAN frame is delayed and tried to send when the bus is free automatically by the controller.
All other half-duplex buses (including Ethernet) work in the same way. That doesn't make them full duplex though.
Not quite, modern Ethernet (using twisted pair and switches or direct connections) is usually full duplex. Old Ethernet with (with BNC cables) is half duplex and the CSMA/CD while the CAN bus is CSMA/CA.
Of course I was comparing CAN with half-duplex Ethernet and not with full-duplex... The point about CSMA/CD vs CSMA/CA is valid though.
One CAN controller is flexcan, what is the other controller?
FlexCAN on both sides.
This means one CAN frame was lost or the order was changed. You have to figure out if it happened on the generator or the reflector.
How can I do that?
FlexCAN on both sides.
Which Soc? Which kernel?
How can I do that?
Try running candump any,0:0,#FFFFFFFF -extA
on both sides and have a look at the order.
Which Soc?
NXP LS1028A
Which kernel?
Currently something called LSDK 20.04: https://source.codeaurora.org/external/qoriq/qoriq-components/linux/tree/?h=LSDK-20.04-V5.4
Try running candump any,0:0,#FFFFFFFF -extA on both sides and have a look at the order.
I'll test again and have that running in the background, thanks. It takes a while to reproduce though.
Currently something called LSDK 20.04:
Okay, some NXP vendor kernel.
As the flexcan driver only uses one mailbox for TX, there'll be no reordering on the TX side. For RX it uses timestamps on the mailboxes, which is also fine. What load do you have on the system, while running the test? Maybe a dropped frame on the socket.
What load do you have on the system, while running the test? Maybe a dropped frame on the socket.
A stress --cpu 2 --hdd 2
, actually. Could you please expand a bit more about the ways in which CPU load affects this application?
If canfdtest
isn't scheduled the RX-queue in the socket will grow, because it cannot receive CAN frames as fast as they are received by the flexcan core. The socket queue will grow and eventually the kernel will drop CAN frames from that queue. That might explain your problem here.
Thing is, stress --cpu only does this, if you look at the source code:
int
hogcpu (void)
{
while (1)
sqrt (rand ());
return 0;
}
That shouldn't pose much of an issue to the scheduler, and neither should stress --hdd
:
int
hoghdd (long long bytes)
{
long long i, j;
int fd;
int chunk = (1024 * 1024) - 1; /* Minimize slow writing. */
char buff[chunk];
/* Initialize buffer with some random ASCII data. */
dbg (stdout, "seeding %d byte buffer with random data\n", chunk);
for (i = 0; i < chunk - 1; i++)
{
j = rand ();
j = (j < 0) ? -j : j;
j %= 95;
j += 32;
buff[i] = j;
}
buff[i] = '\n';
while (1)
{
char name[] = "./stress.XXXXXX";
if ((fd = mkstemp (name)) == -1)
{
err (stderr, "mkstemp failed: %s\n", strerror (errno));
return 1;
}
dbg (stdout, "opened %s for writing %lli bytes\n", name, bytes);
dbg (stdout, "unlinking %s\n", name);
if (unlink (name) == -1)
{
err (stderr, "unlink of %s failed: %s\n", name, strerror (errno));
return 1;
}
dbg (stdout, "fast writing to %s\n", name);
for (j = 0; bytes == 0 || j + chunk < bytes; j += chunk)
{
if (write (fd, buff, chunk) == -1)
{
err (stderr, "write failed: %s\n", strerror (errno));
return 1;
}
}
dbg (stdout, "slow writing to %s\n", name);
for (; bytes == 0 || j < bytes - 1; j++)
{
if (write (fd, &buff[j % chunk], 1) == -1)
{
err (stderr, "write failed: %s\n", strerror (errno));
return 1;
}
}
if (write (fd, "\n", 1) == -1)
{
err (stderr, "write failed: %s\n", strerror (errno));
return 1;
}
++j;
dbg (stdout, "closing %s after %lli bytes\n", name, j);
close (fd);
}
return 0;
}
I would like to test your hypothesis though. What are some CAN socket statistics counters I can check?
Hello,
When setting high speed bitrate (such as 500000 or 1Mbps), the actual bitrate between the dut and the generator caps around 208000 bps.
I don't understand why the canfdtest utility inject a millisecond both the generator and the dut. Could I remove this hardcoded behavior ?
On the dut run: canfdtest -v can0 On the Workstation run: time -p canfdtest -l 20000 -g -v can0
# canfdtest exchanges each message twice (in both direction) # messages are 8-bytes long (120 bits per CAN message) Then the actual bitrate is = 20000 120 2 / ${time}