neobotix / neo_kinematics_omnidrive2

ROS 2 packages for kinematics of MPO-700
3 stars 4 forks source link

DDS and SocketCan #17

Closed padhupradheep closed 1 year ago

padhupradheep commented 1 year ago

There have been a long standing issue with DDS and our SocketCan node implementation. It would be good, if we could somehow get into the root of this issue, probably would need to take some effort into diving a bit deep into DDS implementation namely CycloneDDS and FastRTPS.

Over the past year, there were issues raised by our clients, reporting the issue that, the SocketCAN node reports the following issue on a very random basis:

image

As seen on the terminal placed on the right side, you can note that, the socketcan node complains about the timing out of the motor status. This happens only in the case when the user decides to have different machines on different subnet. Currently in order for configure different PC on different subnet to work on a single ROS domain ID, we have to configure the respective DDS that the user wants to work on. Apparently in our local setup, this issue has never occurred at all, atleast to my recent memory. But note that, all our robots are connected to the same subnet. Usually, the suggestion that we give to our customers is that, all the robots / host machines needs to be connected to the same subnet. Unfortunately, there are some undisclosed reasons where the client does not want to work on the same subnet.

There are also other random issues that happens such as the failure to get the odom readings, impossible to send the command velocity and so on. But this all happens only when the robots are connected to other subnet.

Few months back, when I personally was working remote, I also wanted to achieve the goal of connecting and communicating through ROS having different subnet. But, that was not completely achievable. On speaking with the folks around the ROS community, even they felt that there is no complete solution to this regards (yet).

The starting point to debug the issue here would be to connect the robots to different subnet and configure preferably the reliable Cyclone DDS and try to reproduce the issue.

Related: https://answers.ros.org/question/406653/lost-large-messages-across-subnets/ https://github.com/eclipse-cyclonedds/cyclonedds/issues/688 https://github.com/ros2/rmw_cyclonedds/issues/284

padhupradheep commented 1 year ago

Further report from the customer is that, the issue is related to the network in their facility. More details soon..

padhupradheep commented 1 year ago

Issue is with the user's local setup.. Closing.. no actions required.