micro-ROS / micro_ros_setup

Support macros for building micro-ROS-based firmware.
Apache License 2.0
354 stars 131 forks source link

ensuring agent is acting as agent for all that it should #560

Open cmraaron opened 2 years ago

cmraaron commented 2 years ago

How to guarantee that agent is representing all publishers, nodes etc as requested?

related to the following issues and PRs:

https://github.com/micro-ROS/micro_ros_arduino/issues/40 https://github.com/micro-ROS/micro_ros_setup/issues/299 https://github.com/micro-ROS/micro_ros_setup/issues/256 https://github.com/micro-ROS/rmw_microxrcedds/pull/86/files https://github.com/micro-ROS/micro_ros_setup/issues/347 https://github.com/micro-ROS/rmw_microxrcedds/pull/143/files https://github.com/micro-ROS/micro_ros_arduino/issues/912

it appears the consensus is to use rmw_uros_ping_agent to detect the loss of agent communications, fini* and reinit all the bits, as exemplified here https://github.com/micro-ROS/micro_ros_arduino/blob/galactic/examples/micro-ros_reconnection_example/micro-ros_reconnection_example.ino

The communications with the agent is often over UDP, so its possible just to miss one of several init packets? In this case the agent would be operating normally, and perhaps we wouldnt notice our missing service for a while.

Is it possible to periodically get the XRCE protocol to re-announce to the agent the set of nodes, topics etc, so that it would self recover? Or is it feasible to implement some kind of "Tell me how you are configured, and I'll compare it to my idea of what I think you should be doing" handshaking in the protocol between agent and microros?

pablogs9 commented 2 years ago

Ping feature can be configured to make N tries for a certain amount of time to avoid a unique UDP package loss: https://github.com/micro-ROS/rmw_microxrcedds/blob/895763c817c7c07ce0be08411924d7747c2acd2d/rmw_microxrcedds_c/include/rmw_microros/ping.h#L52

XRCE offers the possibility to create the entities with a REUSE flag, in the case of micro-ROS we are using a REUSE or REPLACE (if something has been modified): https://github.com/micro-ROS/rmw_microxrcedds/blob/895763c817c7c07ce0be08411924d7747c2acd2d/rmw_microxrcedds_c/src/rmw_publisher.c#L160

Maybe we could add a RMW API with something like refresh_entities() that just tries to recreate the entities with a REUSE.

More info here: https://micro-xrce-dds.docs.eprosima.com/en/latest/client.html#creation-policy-table

Could you please tell us if this would be ok for you?

cmraaron commented 2 years ago

Yes! That sounds ideal. Either with REUSE, which could then indicate if something was missing, and it would be the callers responsibility to tear everything down and try again. Or REUSE | REPLACE, which would return which resources were transparently created (replaced?), if my understanding is correct.

Eg this could be necessary in the unlikely case that the agent restarted so quickly that ping didnt notice the outage.

pablogs9 commented 2 years ago

Currently, we have quite low bandwidth for doing this kind of feature. For sure it will be in our roadmap, but do not expect it soon.

If you want to contribute this with a PR in the micro-ROS RMW with the approach, we can take a look.

Thanks!

aditya2592 commented 1 year ago

I found this post while looking into why rmw_uros_ping_agent, other init functions can sometimes fail. Currently, we retry these in a loop but the method suggested here definitely sounds like a good idea!

aditya2592 commented 1 year ago

@pablogs9 I was going through the tutorial here which mentions that setting the client key should allow for entity reuse. Does this already cover the case discussed here?