Closed Ayush1325 closed 3 months ago
You are running out of network buffers according to the log output. The CONFIG_NET_TCP_MAX_SEND_WINDOW_SIZE
option tuning could help the situation. See the TCP configuration guide for help
https://docs.zephyrproject.org/latest/connectivity/networking/net_config_guide.html#tcp-options
It would be useful to see sniffer data. Any TI Dev Board can be transformed into a sniffer via their PacketSniffer 2 firmware. Analyzing the wireshark protocols could help us identify the root cause.
It would be useful to see sniffer data. Any TI Dev Board can be transformed into a sniffer via their PacketSniffer 2 firmware. Analyzing the wireshark protocols could help us identify the root cause.
Hi. Is there a way to use PacketSniffer 2 in Linux? I can only see stuff for windows.
You are running out of network buffers according to the log output. The
CONFIG_NET_TCP_MAX_SEND_WINDOW_SIZE
option tuning could help the situation. See the TCP configuration guide for help https://docs.zephyrproject.org/latest/connectivity/networking/net_config_guide.html#tcp-options
Currently, it is set to 0, which means it will be calculated automatically.
Also, I did try increasing both CONFIG_NET_TCP_MAX_RECV_WINDOW_SIZE
and CONFIG_NET_TCP_MAX_TX_WINDOW_SIZE
to 20480 but no improvement.
Hi. Is there a way to use PacketSniffer 2 in Linux? I can only see stuff for windows.
I'm afraid that's true - several of TI's proprietary tools do not exist for the Linux platform. I use them in a Windows VM running on Linux which works well. As you're in the SubG range and TI's SUN SubG PHY settings are not 100% standard conforming, you're pretty much depending on their tools currently AFAIK.
I think there's a way to capture packages in Zephyr and forward them on another interface as alternative to using a dedicated sniffer, but I've never worked with that arrangement. And you'll only always see one side of the equation.
Re your TCP problems - it might just be that you don't get responses reliably, especially if you use ALOHA without ACK switched off. CSMA/CA and ACK are there for a reason... You'll see traces of collisions and missing packages when sniffing or debugging the TCP protocol details directly.
Zephyr has packet capture support, see this document https://docs.zephyrproject.org/latest/connectivity/networking/network_monitoring.html There is a sample that can show what can be done in your own application, https://docs.zephyrproject.org/latest/samples/net/capture/README.html I recently created support for cooked mode capture in this PR #70926. It allows user to capture low level non-IP based packets. The PR has example for capturing HDLC frames in PPP traffic and send them for analysis to external systems, see this commit 7fdebb5a917685b88208675a31271d0bceab6f69 for details. For 802.15.4 L2 packet capturing, some extra work would need to be done, basically adding packet capture calls to the proper places in 802.15.4 stack (similarly what I did for PPP), if one wants to see 802.15.4 L2 traffic. If normal IP traffic is enough, then that IP traffic can be captured already. If the generic packet capture is setup and enabed (either from net-shell or directly from capture API), the packets are then automatically captured for analysis. One caveat in the capturing packets inside a Zephyr device, is that it is not storing the packets locally but send them to outside system where Wireshark or similar could be used to show the data. This requires extra IP based network connection in the device. For normal IP traffic capturing, the same interface can be used for capturing and transferring the captured packets, but for cooked capture mode, the captured packets cannot be sent via the same interface (as that would cause recursion atm, there might be a way to avoid that but it requires some extra coding).
Re your TCP problems - it might just be that you don't get responses reliably, especially if you use ALOHA without ACK switched off. CSMA/CA and ACK are there for a reason... You'll see traces of collisions and missing packages when sniffing or debugging the TCP protocol details directly.
The initial testing is being done with CSMA/CA and ACK since I was not really testing the performance in this issue (it has to work reliably before worrying about performance).
I will try sniffing and a git bisect to pin point the problem.
The initial testing is being done with CSMA/CA and ACK since I was not really testing the performance in this issue (it has to work reliably before worrying about performance).
Totally agree.
I will try sniffing and a git bisect to pin point the problem.
Nice. You may of course contact me any time on Discord as well if there's something you think I could help you with. I'd really love to see this hardware to be better supported.
So did a git bisect and found that the commit responsible is most likely 128354ae17fb95ec6c61d73b9060f1a274715464 .
However, it seems that the problem is a bit different from what I initially considered. What seems to be happening is that for some reason the driver does not recover if it runs out of available rx and/or tx packets. That is to say, the samples does not recover after op->status == PROP_ERROR_RXBUF
. The TCP Socket closes with error 116 (timeout).
I seem to be able to recover if I reset the rx buffers each time on drv_rx_start
as was present before the commit (https://github.com/zephyrproject-rtos/zephyr/commit/128354ae17fb95ec6c61d73b9060f1a274715464#diff-8278a3f5641432256702d96822ad636a501df47e873791c58ccdd79b77cad1bcL429), but not completely sure why that is.
To speed up testing, I had to set CONFIG_NET_L2_IEEE802154_FRAGMENT_REASS_CACHE_SIZE=8
. This causes the Could not get a cache
error to not come up after the above mentioned commit (seems it runs out of packets before running out of cache).
So #72977 does seem to prevent timeout errors, but it is not a complete fix. While it does not time out now, it still stops in under 2 mins. The error is as follows:
[00:01:30.354,797] <err> net_echo_server_sample: TCP (IPv6): Failed to send, closing socket
On further investigation, it seems send
fails with error 105, i.e. ENOBUFS
@Ayush1325 Puh - sorry to hear that. Sounds like we had to go back to digging deeper in the code. Debugging with a breakpoint on the non-zero error code path would give you a good indication where this comes from as you see the stack then. Probably you'll have to do several rounds to track it across threads. Can I help something to make this easier for you? (And thanks again for investing all that debugging time here, very much appreciated!)
I am reopening the issue since as mentioned in the previous comment, while I am no longer geting receive timeout errors, TCP is still not usable due errors while sending.
I have found the reliability for long running tasks has recently improved https://github.com/zephyrproject-rtos/zephyr/pull/73488 Will close this once that is merged.
@Ayush1325 Can you check please, whether this issue is still required and close otherwise?
Describe the bug After investigating UDP over subg on beagleconnect freedom (https://github.com/zephyrproject-rtos/zephyr/issues/68674), I have moved to testing TCP. However, I cannot seem to get TCP reliably working on the
echo_server
andecho_client
.Note: The subg overlays present in
echo_server
andecho_client
disable TCP by default, so to reproduce these results, it should be enabled.Here are the logs from
echo_server
:And here are the logs from
echo_client
:Some additional config options I tried adding to fix this are as follows:
To Reproduce Steps to reproduce the behavior:
Expected behavior All packets to be received by the server.
Environment (please complete the following information):