Multirobot support - Githubissues

ShawnSchaerer commented 4 years ago

Not sure if this is the best place to ask. Has there been any design discussions on how to support multi-robot systems in ROS2?

I see that this topic is on the ros2 roadmap. We are at the point where this is an important feature for us and we would like to partition our systems such that individual robot topics are not discoverable across the system.

ruffsl commented 4 years ago

I'd just like to suggest that for ROS2 we avoid solely relying on prefixing namespace as resorted to in ROS1 to partitioning robots. A first class abstraction of a robot could be much more powerful than attempting to push every namespace into prefix, given the limited length of ROS2 namespaces, and number of DDS topics per robot (just running nav2 turtlebot demo adds up to ~366 topics).

As for the transport level, I think DDS partitions could be nice axis to project the entity of a robot onto, as DDS domains may be too extreme; preventing topic participants from easily communicating across robots, .e.g. pose, or map. This could be kept optional, defaulting the partition to empty as currently, but if desired, the user could specify an expression for which robots to connect with.

from rclpy.robot import Robot
class MinimalBot(Robot):
  def __init__(self):
    super().__init__('minimal_robot')
    robot_pub = self.create_publisher(String, 'pose', 10, robot="/bot1")
    swarm_sub = self.create_subscription(String, 'pose', callback, 10, robot="/bot*")

Not sure how to bubble up to subscribers listing to across robots from which robot/partition a message was received from, but perhaps we could add a 'robot' handle to extend the return type to callbacks.

ShawnSchaerer commented 4 years ago

My assumption was that namespaces mapped to DDS partitions.

wjwwood commented 4 years ago

My assumption was that namespaces mapped to DDS partitions.

That's not the case.

I placed the "backlog" label on this issue as part of our triage process, but interested team members will comment on it. Also, this might be something that could be better discussed on discourse as part of WG or the "next gen" category, at least until there's concrete work to be done.

wjwwood commented 4 years ago

We are at the point where this is an important feature for us and we would like to partition our systems such that individual robot topics are not discoverable across the system.

If you want to avoid discovery entirely, then DDS domains are probably the right solution. The domain id can be selected with the ROS_DOMAIN_ID environment variable and is the moral equivalent to having more than one master in ROS 1.

I'd just like to suggest that for ROS2 we avoid solely relying on prefixing namespace as resorted to in ROS1 to partitioning robots. A first class abstraction of a robot could be much more powerful than attempting to push every namespace into prefix, given the limited length of ROS2 namespaces, and number of DDS topics per robot (just running nav2 turtlebot demo adds up to ~366 topics).

I'm not sure I agree with this. I don't think we should have a "robot" concept, instead it would be better, in my opinion, to have an array of communication features and let people decide how to use them. One team might prefer to separate their robots one way and another in a different way.

My opinion on this has always been that there's no silver bullet for multi-robot, so ROS should just give tools to people so they can accomplish it the best way they know how. It's fine to have some recommendations, but by having a built-in "robot selection" mechanism we'd be selecting one of them, perhaps steering people away from better solutions for their use case.

Adding a mechanism equivalent to DDS Partitions to ROS would be a big undertaking. The only reasons would be to gain more characters for namespacing (partitions are also limited in length) or to subscribe across multiple "topics" at once (but this is where a lot of the complexity comes in).

The number of topics isn't really the constraint here, it's the depth of the namespace... You can have as many topics as you want as long as you don't have extremely long topic names. That's a resource that system integrator will have to manage.

A lot of these trade-offs are talked about here:

https://design.ros2.org/articles/topic_and_service_names.html#alternative-using-dds-partitions

spiderkeys commented 4 years ago

I know it has been a while since this issue was created, but I wanted to add a little bit to the conversation about using partitions for multi-robot/entity systems. At OpenROV, we used DDS partitions for this purpose with great success, which I will describe below.

All robots published to a topic called Beacon which contained a unique identifier for that robot.
- This Datawriter had KeepLastReliable.TransientLocal QoS
- The datatype for the Beacon message marked the "uuid" field as a key, so that there would be separate last value caches for each unique robot
Applications which had the intention of interacting with one or more robots would subscribe to this beacon topic
- The datareaders for the Beacon message would also use KeepLastReliable.TransientLocal QoS
- The datareaders were also created under Subscribers that belonged to the { "*", "" } wildcard partition, such that they could receive the beacon message from any robot and discover its UUID.
All datawriters/datareaders created on a robot would be created under Pubs/Subs with Partition set to a minimum of { "$UUID" }
- The robots could also list additional partitions, for tasks such as creating swarms/groups of entities.
On the client side (usually a GUI application for controlling or monitoring one or more robots), the application could use the discovered UUID information from the Beacon message to create Publishers/Subscribers that used the desired partition IDs to control what data they wanted to "connect" on.
In the case of the Trident drone + Android app:
- The app would learn about the presence of multiple robots through the Beacon message via a wildcard subscriber
- Buttons would be presented to allowing you to click and pilot any particular robot
- Once one of these buttons was clicked, a new Subscriber+Publisher pair would be created using that robot's UUID, and then all telemetry/control/video datareader/datawriters for that robot would be created under the partitioned Subscriber+Publisher
- "Connecting" to multiple robots was as simple as instantiating a new instance of a class which housed the pub/sub & reader/writers, e.g. robot = std::make_shared( uuid );
- "Disconnecting" from a robot was as simple as destroying an object which encapsulated the Pub/Sub or Writer/Readers.

There are some additional advantages to using partitions, rather than using Domains:

By default, domain IDs are tightly coupled to the generation of UDP port numbers and other locator/GUID generation tasks. Using high-value domains can sometimes lead to unexpected behaviours and issues at the DDS level.
In my experience, I find it easier to create a unique identifier via a 16 character string (recommended partition length), rather than a uint16_t (domain ID).
Though domains are the best way to prevent discovery between applications in DDS, stopping all discovery beyond the participant discovery messages, partitions are the next best thing, preventing the exchange of reader/writer info between pubs/subs that don't match partitions. This helps to cut down on discovery overhead. Without using partitions, and instead relying on topic name prefixes, the publishers and subscribers will still trade information about their readers/writers to check for matches (of which there would be none).
Partitions are dynamic. You can add and remove them from a pub/sub's partition list during runtime to allow entities to "join/leave" groups as desired. I like to think of this with the analogy of IRC chat servers, where the server is a Domain, and individual channels are partitions that you can join and leave as you wish, and you can belong to multiple simultaneously. This property, combined with being able to belong to multiple partitions, is very useful for robots operating within swarms to publish information only to members of their group and for applications (like GUI apps or swarm coordinators) to communicate only with the robots that it needs to.

I know it may be difficult to reintroduce the concept of using partitions to ROS2, but I think it would be a valuable discussion to have. Would love to hear yours and other's thoughts, @wjwwood. Also happy to clarify and expand upon any of the above if there are questions.

spiderkeys commented 4 years ago

For those who are visual, like myself, I created the following diagram:

Screenshot from 2020-07-10 00-02-56

Edit: Fixed a couple of copy paste errors.

ros-discourse commented 4 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/transport-priority-qos-policy-to-solve-ip-flow-ambiguity-while-requesting-5g-network-qos/15332/4

ros-discourse commented 3 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/restricting-communication-between-robots/2931/16

ros-discourse commented 3 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/restricting-communication-between-robots/2931/30

ros2 / design

Multirobot support #261