Closed firesurfer closed 1 year ago
@firesurfer Can you confirm that this problem happens only if the RaspberryPi is involved and not between x86 machines (same network configuration).
@richiware Did you ever encounter similar problems when testing on RaspberryPi ?
@mikaelarguedas I can confirm that this doesn't happen on x86 if both sender and reciever are running on the same machine. In case of two x86 machines I will do another test on friday.
I will prepare a raspberry pi 3 environment and test it.
I was testing with two scenarios:
In both cases the subscriber starts to print data after a delay of 4 seconds. Then it receives data without any problem. I was investigating the delay. Using Wireshark I saw first RTPS packet (Participant discovery message) is sent after 4 seconds since application started. Right know I don't why, whether the application takes much time to boost or other reason.
On x86 on our network setup it takes one message until the first message is recieved. It depends if the message data has been changed or not. In case the message data has been changed like additionally setting a field that hasn't been set before there is a delay of one message. In case the data hasn't been changed or a field that already had data is changed there is in most cases no delay.
Edit: Another interesting thing I found while debugging. In case I put a small delay after each publish call in my own program. The messages seems to be transmitted fine.
this->store_data_publisher->publish(msg);
std::this_thread::sleep_for(std::chrono::milliseconds(10));
Hi.
We are having the similar issues with a ROS2 network where the nodes after some time stop working. We realise that some of the RTPS messages are encapsulated inside an ICMP message (with a Destination unreachable (Port unreachable)
error) as commented here https://github.com/ros2/rmw_fastrtps/issues/157#issue-265790753.
INFO_DST, INFO_TS and DATA:
Also HEARTBEAT:
@richiprosima Any news on this ? This issue practically renders our raspberry pi mesh network unusable with ROS2.
I'm updating my raspberry ros2 environment and will try again to reproduce the issue.
Sorry for the delay. I was travelling. I achieved to prepare a raspberry environment. It is simple: two raspberry pi communicating through a switch using ethernet. Raspberry were communicating between them from 6 hours without problems.
@abilbaotm What differences are there between your scenario and mine? Are you using ethernet or wireless? What is the info returned by ifconfig -a
? All info is appreciated.
Hi. Our setup is the next one.
ifconfig -a
is the next one:
enxb827eb945e61: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.28 netmask 255.255.255.0 broadcast 10.0.0.255
inet6 fe80::ba27:ebff:fe94:5e61 prefixlen 64 scopeid 0x20<link>
ether b8:27:eb:94:5e:61 txqueuelen 1000 (Ethernet)
RX packets 60045 bytes 13982058 (13.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 81003 bytes 17415879 (16.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
wlan0: flags=4098<BROADCAST,MULTICAST> mtu 1500 ether b8:27:eb:c1:0b:34 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
At what rate are you publishing? I think that the issue is when data is published too fast. Like more than `50-100 MHz`.
Thanks @richiware for your time!
Hi, we noticed this issue when starting to publish. It takes around 4 publishing turns until a message is recieved. Afterwards all messages are recieved. If you cancel the publishing node, wait like 10-20 seconds and then start again the delays is there again. If you cancel the publishing node and restart it immediatly there is no delay.
When using wireshark we also get a lot of ICMP - Destination unreachable (Port unreachable)
messages when using ros2. Nevertheless we can communicate with the raspberry pis via ROS2 that are mentioned in the Destination field of the corresponding wireshark message. But there is the above mentioned delay or the messages are dropped (I can't say if the messages are just delayed or if the first 2 or 3 messages are dropped)
I just did some new tests. Apparently it depends if there is any other ROS2 communication on the network. For testing purposes I connected only two raspberry pis and ran our own software together with ros2 topic and only ros2 topic in comparison. In the second case all messages are recieved. What I noticed is that apparently ros2 topic pub has a long delay at startup. We noticed ourselfs that having a delay at startup resolves some message transport problems in our software. @mikaelarguedas could you perhaps explain why there is such a long delay at startup ?
And our network configuration of one pi.
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.180.55.31 netmask 255.255.255.0 broadcast 10.180.55.255
ether b8:27:eb:8a:92:3b txqueuelen 1000 (Ethernet)
RX packets 127525 bytes 13552241 (12.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 61305 bytes 9454530 (9.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1 (Local Loopback)
RX packets 10490 bytes 1520412 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10490 bytes 1520412 (1.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Hi the problem with ros2 echo / pub was that apparently you have to wait long enough for echo to be setup properly. During my testing last friday it seems that I waited just a bit longer sometimes.
Nevertheless I could track down the ICMP - Destination unreachable error. But I'll open another issue tracker for that.
I think that the question why there is such a long delay at startup is still valid.
Please see https://github.com/ros2/ros2/issues/480 regarding this issue. A colleague of mine created a container environment in which the error comes up.
In our real setup the issue became better (but not completly solved) after the deadlock fix was commited in FastRtps last week: https://github.com/eProsima/Fast-RTPS/commit/17e717c0740dab99c353b07c4e76237bcc7a32ba
We noticed that FastRTPS seems to drop messages when used on a raspberry pi with an x86 computer as sender but also with another raspberry pi as sender.
The used message: msgs/StorageData
The used commands for sending and recieving:
ros2 topic echo /storage_data_topic
ros2 topic pub /storage_data_topic msgs/StorageData '{"uuid": "test", "sendernode": "mynode", "data": [1,2,3,4,5,6,7,8,9,10]}'
Start listen first, then publish. What can be observed: It takes at least 4 messages to recieve one message when publishing from an x86 computer. It doesn't start recieving any message if publishing is done from another pi. Restarting the listener helps. Messages will be recieved afterwards. In general. Sometimes long delays between messages and/or message gets dropped.
The phenomena is even more extrem when used in an own application with the
parameters qos profile
. The subscription running on the pi won't recieve any messages of this type (other messages are working more or less fine - sometimes delayed by 10s). This also happens when listening withros2 topic echo
but sending with our own application.Used version of ROS2: Current master branches. x86 Computer: Debian Testing Raspberry Pi 3: Raspbian Testing Network topology: Multiple Switches configured with Spanning Tree Protocol (STP). Multiple raspberry pis in network.
Edit: Using wireshark I could determine there is often an
ICMP Destination unreachable (Port unreachable)
message with destination of either the x86 computer or the respberry pi.Edit 2: I could determine that this issue depends on the data in the message. Example: Set "sendernode" to any data. Then send it. It takes at least three sending cycles until a message is recieved. Stopping the sending process and restarting it results in immediate recieving of the message. Stopping it, changing the data a bit, results into one or two sending cycles until a message is recieved. Changing the data a lot, like setting another field results in at least three sending cycles.