hartkopp / can-isotp

Linux Kernel Module for ISO 15765-2:2016 CAN transport protocol PLEASE NOTE: This module is part of the mainline Linux kernel since version 5.10
Other
248 stars 71 forks source link

When trying to read 8K messages on ISO-TP socket using Classic CAN receive back -1 with errno 110 ETIMEDOUT #54

Closed derek-will closed 2 years ago

derek-will commented 2 years ago

Interestingly, I can write this data successfully on the sender socket, but I cannot receive the same data on the receiver socket.

It appears that the container_of function is returning an isotp_sock struct with rx.state == ISOTP_WAIT_DATA here in the source.

When I run candump I don't see all frames. Please note that this is all happening on a vcanX interface.

derek-will commented 2 years ago

I think the candump issue has more to do with the rate that the CAN frames on the vcan interfaces are sent. I experienced similar dropout when monitoring the bus with cangen gap set to 0 ms. My homegrown simple "proof of concept" bus logger performed about the same during full load testing. What I think is happening here is that the vcan interface is transmitting the CFs at crazily fast rate with the default configuration. Either way the problem isn't related or at least I currenty don't believe it to be.

The problem doesn't manifest itself with ISO-TP messages that use Classic CAN frames when the payload is slightly north of 4095 bytes, but much higher than that and you start to run into problems.

For ISO-TP messages that use CAN FD frames this problem is not present up to the old limit of 8200 bytes.

derek-will commented 2 years ago

Figured it out! By default frame_txtime is set to 0 ns. On vcan interfaces this causes problems when receiving lots of CFs as is the case with larger ISO-TP messages coupled with smaller frame sizes. By setting frame_txtime to a non-zero value (even 1 ns will do), then receiving a 8200 byte ISO-TP message with Classic CAN works! :-)

As an added bonus, candump sees the entire message as well without dropping any frames.

I suspect there is some relationship between the two issues then. The receiver socket must have been dropping frames in the background and therefore timing out while waiting for the next expected CF sequence number.

Should we consider defaultiing frame_txtime to 1 ns on vcan interfaces? Or somehow otherwise compensate for the fact that we are in a virtualized space with zero friction? From an application programming perspective I can always configure my apps to set the frame_txtime to non-zero when vcan interfaces are being used, but maybe we should remove this tripping hazard for the application programmer.

hartkopp commented 2 years ago

Here we are :-D

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/can/isotp.c?id=530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3

This was the second feature I did not apply to this out-of-tree repo. I only applied the fixes so far ... will add the above patch soon.

This out-of-tree repo also makes no use of soft hrtimers to maintain backward compatibility. Which kernel version are you working with, so that you still need the out-of-tree implementation?

hartkopp commented 2 years ago

Added the patch can: isotp: set default value for N_As to 50 micro seconds https://github.com/hartkopp/can-isotp/commit/801354ff06132b946eb9e5f44b17819ca8a81347

derek-will commented 2 years ago

This was the second feature I did not apply to this out-of-tree repo. I only applied the fixes so far ... will add the above patch soon.

Ah! Seems like we were on two different paths and ended up at the same destination. I found this issue last fall when I was testing the Escape Sequence support offered by the isotp module. However, I was newer to the project at that point and so I wasn't confident enough to report anything. I am happy to see you have already fixed it! I really like the solution you implemented. :-)

This out-of-tree repo also makes no use of soft hrtimers to maintain backward compatibility. Which kernel version are you working with, so that you still need the out-of-tree implementation?

I am using 5.16.11 and so I use the in-tree version by default - For making changes and testing them I have just been working on the out-of-tree version and swapping between the two. I am new to kernel module development and so I viewed the out-of-tree repo as an easier way to make changes and I (wrongly) assumed feature parity.

I did some research and found out that I can pull down the linux kernel source tree and build just the module that I want to via this StackOverflow question. I will do that in the future.