[ROS 2] Separate robot description publishing from robot state publishing

sloretz commented 4 years ago

I've run into a situation where I want to use robot_state_publisher to publish tf transforms for a robot description, but I also want to provide the robot description via code. I'd like to take SDFormat XML describing a robot in simulation and publish joint states from the simulation, but let robot_state_publisher handle turning those joint states into tf transforms.

Currently that means starting robot_state_publisher with command line arguments or a parameter yaml file set to that robot_description. I'd much rather publish the robot description myself and have an already running robot_state_publisher consume it from a topic.

clalancette commented 4 years ago

Currently that means starting robot_state_publisher with command line arguments or a parameter yaml file set to that robot_description. I'd much rather publish the robot description myself and have an already running robot_state_publisher consume it from a topic.

This was kind of a design decision when I revamped robot_state_publisher for Foxy.

In ROS 1, the XML text of robot_description is a property of the global parameter server. Since we don't have that in ROS 2, "something" has to own it [1]. I made the determination at the time that robot_state_publisher was the owner, though that is not formally documented anywhere. That's how we've ended up where we currently are.

To be totally clear, that idea of "ownership" wasn't meant to preclude using it in another way. But it was meant to make people think about how to design their system for dynamic changes to the description. So here we are :).

I guess I'll ask the obvious question: would it be possible in your system to turn things around so that robot_state_publisher is the owner of the SDF, and the simulation component consumed it via the robot_description parameter? That more closely matches what I envisioned here.

[1] Previously in ROS 2 we had nothing that "owned" the robot_description. That meant that each program that wanted the description had to be passed it during startup. The problem with that model is that it becomes a lot harder to do updates to the description, which is why I wanted to find a place for the description to live.

sloretz commented 4 years ago

In ROS 1, the XML text of robot_description is a property of the global parameter server. Since we don't have that in ROS 2, "something" has to own it.

Yup, If I understand correctly: the parameter server owned the robot description in ROS 1. Right now in ROS 2 robot_state_publisher owns it, where owning it means publishing on a topic.

that idea of "ownership" wasn't meant to preclude using it in another way. But it was meant to make people think about how to design their system for dynamic changes to the description

If I understand correctly the ownershipp idea isn't new; it's the change to a topic that forces thinking about dynamic changes - like what should the system do if a new robot description is published.

I guess I'll ask the obvious question: would it be possible in your system to turn things around so that robot_state_publisher is the owner of the SDF, and the simulation component consumed it via the robot_description parameter?

This question seems to be assuming a different use case - I should have given more detail in the issue description. I see it making sense for the simulation component to consume the robot description if the robot being simulated runs ROS 2 and one wants to simulate it. They would start the robot software and spawn it in the simulation when that's ready.

The use case I'm looking at is different. There is a working simulation already and someone wants to interact with it using ROS tools. The simulation has a world which includes a description of a robot it's simulating, but the robot may not run ROS 2 at all. I don't think it makes sense for the simulation to consume a robot description because it already works fine even if no ROS 2 code is running - it doesn't need or want a dependency on ROS, it just wants to occasionally use it when convenient.

This issue isn't about changing the idea of something owning the robot description or about changing the mechanism that deliveres it to consumers. The result I'd like to see is a dedicated node publishes the robot description with nodes that do useful things with a robot description (like joint_state_publisher, robot_state_publisher, and the RViz robot model display) only being subscribers. That allows someone to take ownership of the description without losing access to any of those tools.

clalancette commented 4 years ago

The use case I'm looking at is different. There is a working simulation already and someone wants to interact with it using ROS tools. The simulation has a world which includes a description of a robot it's simulating, but the robot may not run ROS 2 at all. I don't think it makes sense for the simulation to consume a robot description because it already works fine even if no ROS 2 code is running - it doesn't need or want a dependency on ROS, it just wants to occasionally use it when convenient.

I see, thanks for the clarification.

This issue isn't about changing the idea of something owning the robot description or about changing the mechanism that deliveres it to consumers. The result I'd like to see is a dedicated node publishes the robot description with nodes that do useful things with a robot description (like joint_state_publisher, robot_state_publisher, and the RViz robot model display) only being subscribers. That allows someone to take ownership of the description without losing access to any of those tools.

So just to make sure I understand you properly. In a fully ROS 2 robot, you would run robot_description_publisher (publishes /robot_description XML), robot_state_publisher (subscribes to /robot_description), joint_state_publisher (optional in many cases, but also subscribes to /robot_description), and whatever else that wants to consume the description.

In the non-ROS 2 use case you outline above, you would not run robot_description_publisher; instead, you would arrange for the already working simulation to somehow publish to the /robot_description topic (let's say via a simulation plugin), and then everything downstream (including robot_state_publisher) would consume it from there.

Is that an accurate description of what you'd like to achieve?

sloretz commented 4 years ago

Is that an accurate description of what you'd like to achieve?

Yup! How's that sound?

clalancette commented 4 years ago

Yup! How's that sound?

We should be able to make it work, but there are some details to work out on how to do it. I see a few different ways:

We add a new parameter to the existing robot_state_publisher which is essentially "get my description from elsewhere". If this is false (the default), then it operates exactly as today (requires a robot_description parameter to contain the description at launch, opens up a publisher to the /robot_description topic). If this is true, then it operates in "subscription mode" (requires robot_description parameter to be NOT_SET/empty, and opens a subscription to the /robot_description topic). The main benefit here is that it is a fairly minor change to the code, and existing users of robot_state_publisher in a ROS 2 robot (most of them) won't have to do anything. The main downside to this solution is that it can be hard to explain that robot_state_publisher can either be the provider of, or the consumer of, the robot_description.
We add in a new node called robot_description_publisher. The only job of this node is to take a file, parse it using the URDF parser plugin infrastructure (to ensure that it is valid), and then publish the text on the /robot_description topic. We would then modify robot_state_publisher to always subscribe to the /robot_description topic. The main benefit to this is separation of concerns; it is easy to describe what each node is doing. The downsides of this are: breaks existing users of robot_state_publisher, hard to run robot_state_publisher standalone/on the command-line.
Leave robot_state_publisher completely untouched. Create a new node robot_state_publisher2 which subscribes to the /robot_description topic, but otherwise acts exactly the same as robot_state_publisher. If we went this way, it would probably be a good idea to refactor the existing code so that we could share most of it between the two nodes. This is essentially a hybrid between the two ideas above. The benefit to this is that it keeps existing users of robot_state_publisher working as before. The downside to this is that it is another piece of code to maintain, and we'll somewhat complicate the existing robot_state_publisher code.

@sloretz Any other ideas come to mind?

sloretz commented 4 years ago

I was thinking something along the lines of option 1. Say robot_state_publisher had a parameter publish_robot_description that defaulted to true. If the flag is true then it behaves as it does now by requiring the robot_description parameter and not subscribing to a robot_description topic. If the flag is false then robot_state_publisher maybe tries to read robot_description from a parameter first if present, but also subscribes to the robot_description topic for updates.

clalancette commented 4 years ago

I was thinking something along the lines of option 1. Say robot_state_publisher had a parameter publish_robot_description that defaulted to true. If the flag is true then it behaves as it does now by requiring the robot_description parameter and not subscribing to a robot_description topic.

I sort of prefer option 3, since it is easy to document them separately. But if you like option 1, that's OK with me too.

If the flag is false then robot_state_publisher maybe tries to read robot_description from a parameter first if present, but also subscribes to the robot_description topic for updates.

I will say that I don't particularly like this. It feels like it is easy for the user to shoot themselves in the foot. The user starts up robot_state_publisher with some URDF in a parameter, but it can immediately get overwritten by something else that might be publishing on the /robot_description topic. It just sounds messy to me. I'd be more in favor of requiring this to be empty in the case that you are in "subscription" mode, that way there is no ambiguity.

chancecardona commented 3 years ago

I too think this would be helpful. An issue I've faced previously with robot_state_publisher is that for complicated robots you need joint_state_publisher (or joint_state_controller) to be working before robot_state_publisher will work correctly (in my experience on ROS1, I haven't had any luck with this on ROS2 yet but it could be a different error that's blocking the robot_description itself). What compounds this issue in ROS2 is that it's much harder to troubleshoot and start the simulation for a new robot since everything relies on robot_state_publisher publishing the description to spawn the robot, but that can only be done once the joint_states are published (correct me if I'm wrong). This is intensified by the ongoing work in ros_control where it's tough to determine if the latest commit has broken things or if one is simply doing something wrong for the controllers or anywhere else in the chain (including the urdf definition, the launch file, etc). Further, I think this would be really helpful for doing dynamic changes to robot_description, which would be HUGELY helpful, even if just small things like mass, friction, or limit changes (even if the limits can only decrease in magnitude or something).

clalancette commented 3 years ago

to be working before robot_state_publisher will work correctly

It's not that you need joint_state_publisher running, but more that you need something that is publishing to /joint_states. After all, this is what robot_state_publisher does; takes in joint states, and publishes tf2 transforms based on that.

but that can only be done once the joint_states are published

There was a bug back in Dashing where the robot_description wouldn't get published until the first joint_state was received. But that was changed (in July 2019) so that the robot_description is always published at startup. So this should no longer be an issue.

Further, I think this would be really helpful for doing dynamic changes to robot_description, which would be HUGELY helpful, even if just small things like mass, friction, or limit changes (even if the limits can only decrease in magnitude or something).

Right, this is one of the things I wanted to support with having robot_description be a topic, rather than a parameter. In my original thinking, robot_state_publisher would be the "owner" of the robot_description, and all downstream components would get their description from that topic. For updates, then, you would just need to send a new URDF to robot_state_publisher, and it would republish the robot_description to all downstream components.

The idea here is to make it so that another entity in the system is the "owner" of the URDF, and robot_state_publisher is one of the consumers. That's a fine use-case as well, it just needs someone to implement it.

jlack1987 commented 2 years ago

i'm a bit late to the game here, but my group has had lots of troubles handling robot_description as well and I thought i'd throw in my 2 cents. We ultimately decided to have a node whose only job was to hold the robot_description parameter and everybody else goes through the ParameterClient to get it. It's ultimately the same approach as a topic except with the ParameterClient you have more control over when a client receives the robot description contents bc the parameter clients get_parameters call returns a future.

clalancette commented 2 years ago

It's ultimately the same approach as a topic except with the ParameterClient you have more control over when a client receives the robot description contents bc the parameter clients get_parameters call returns a future.

I thought about that when I originally implemented the topic, but the problem is that it isn't dynamic. If the description changes, then the downstreams have no way of knowing they have to fetch it again, unless robot_state_publisher publishes another topic which says "refetch". At that point, you may as well just publish the actual data over the topic, which is what robot_state_publisher currently does.

I'd be interested to hear why you struggled with this, and why it matters "when" the client receives the robot description. Since the robot_description is a transient_local topic, all subscribers should ultimately get it, even late-joining ones.

jlack1987 commented 2 years ago

why it matters "when" the client receives the robot description

I'd say just on principle retrieving a required parameter should be done as a blocking call. Otherwise I have to use some sort of looping sleep for an arbitrary duration and hold up execution until the robot description callback has been hit. With a service you can spin until future complete to guarantee you get it or error out otherwise.

Also can't you use the on_parameter_event call to add a callback when a parameter of a remote node changes? I think that should solve the problem; however, in my 10 or so years of doing this i've never seen the URDF change at runtime and I can certainly say i've never written any code myself that's capable of handling a runtime change to the URDF :laughing: I know that's no reason to not support it though. Let me know what you think

clalancette commented 2 years ago

I'd say just on principle retrieving a required parameter should be done as a blocking call. Otherwise I have to use some sort of looping sleep for an arbitrary duration and hold up execution until the robot description callback has been hit. With a service you can spin until future complete to guarantee you get it or error out otherwise.

That's true. Though I'll mention that the spin_until_future complete is really just a convenience; you could do the same with a local promise/future that doesn't get set until the callback from the robot_description topic comes in, and spin while waiting on that.

Also can't you use the on_parameter_event call to add a callback when a parameter of a remote node changes?

Oh yeah, I guess that is true. You could do that instead.

I think that should solve the problem; however, in my 10 or so years of doing this i've never seen the URDF change at runtime and I can certainly say i've never written any code myself that's capable of handling a runtime change to the URDF laughing I know that's no reason to not support it though. Let me know what you think

I know for sure that there are use-cases out there where this is useful; just think of a manipulator changing an end-effector tool. That said, this isn't a feature for everyone; if you have a robot that never changes itself, this feature may not be useful to you. I'll also say that this is a bit of a chicken-and-egg problem, in that we've never had support for changing URDF in the core before, so this is the first (baby) steps towards that.

Dagu12s commented 2 years ago

Right, this is one of the things I wanted to support with having robot_description be a topic, rather than a parameter. In my original thinking, robot_state_publisher would be the "owner" of the robot_description, and all downstream components would get their description from that topic. For updates, then, you would just need to send a new URDF to robot_state_publisher, and it would republish the robot_description to all downstream components.

Is this supported right now? Im developing modular robots and it would be a perfect fit to change the URDF on-line

clalancette commented 2 years ago

Is this supported right now? Im developing modular robots and it would be a perfect fit to change the URDF on-line

I would say that it is implemented in robot_state_publisher, but not particularly well tested. The rest of the ecosystem may or may not support it. If you do happen to try it out, I'd be interested to hear the results and/or review changes to make it work better.

jdlangs commented 2 years ago

I don't think I've seen mentioned that there's often more information to "describe" a robot than just the URDF. Our applications typically also involved things like an SRDF for planning information and other various configuration data. Right now what usually happens is the application launch file "owns" everything by reading the input files and distributing them to whatever nodes need them via parameters. However, this is not dynamic, often ends up duplicating information, and can't easily scale to multiple robots. It would be very nice to have a dedicated node that was the source of truth for this information and then you could easily have a single node per robot with fully dynamic behavior.

clalancette commented 2 years ago

It would be very nice to have a dedicated node that was the source of truth for this information

Right, that was actually one of the pieces of thinking behind this proposal initially. Instead of having robot_state_publisher be the source of truth for the URDF, have it be a consumer from some other entity. That entity, in turn, could easily have all of the "description" of the robot.

That said, this kind of thing is probably very application-specific. I'm not sure how much we can reasonably generalize the concept, but if someone has time, it would be interesting to try to create a node that can do this. Someone would also need to take up this issue and create a PR for it (I don't think it will be particularly challenging, but we have to keep the existing functionality working as well so there is a bit of trickiness to doing it).

Dagu12s commented 2 years ago

Is this supported right now? Im developing modular robots and it would be a perfect fit to change the URDF on-line

I would say that it is implemented in robot_state_publisher, but not particularly well tested. The rest of the ecosystem may or may not support it. If you do happen to try it out, I'd be interested to hear the results and/or review changes to make it work better.

I have been trying to change the URDF dinamically and i have found some problems. What i did was to launch robot_state_publisher with a URDF containing 2 links.

If i try to add new links, it works correctly and the the topics /tf_static and /tf updates themself with the new URDF. The problem comes when you launch the robot_state_publisher with some links and try to remove some of them. The tfs still contains the links and joints that you removed when updating the URDF. Looks that removing the links is not fully suported yet

clalancette commented 2 years ago

If i try to add new links, it works correctly and the the topics /tf_static and /tf updates themself with the new URDF. The problem comes when you launch the robot_state_publisher with some links and try to remove some of them. The tfs still contains the links and joints that you removed when updating the URDF. Looks that removing the links is not fully suported yet

I took a quick look at the robot_state_publisher code, and it seems to do the correct thing. When a new URDF comes in, it completely removes all internal knowledge of segments and mimic joints, then parses the URDF and rebuilds it. From then on out, it should only ever publish tf frames corresponding to those new joints.

That said, there are several possibilities of what is going wrong in your testing:

You are using an older version of robot_state_publisher than what I looked at (master branch), and there is a bug in that older version. What version of robot_state_publisher are you using?
My quick look was too quick, and there is indeed a bug in robot_state_publisher.
The underlying tf2 library is somehow caching things and republishing.
Something else.

I probably won't have time to debug this anytime soon, but that's the order in which I would debug things. If you have time to look at it, I'd be happy to review pull requests that fixes it.

Dagu12s commented 2 years ago

You are using an older version of robot_state_publisher than what I looked at (master branch), and there is a bug in that older version. What version of robot_state_publisher are you using?

My quick look was too quick, and there is indeed a bug in robot_state_publisher.

The underlying tf2 library is somehow caching things and republishing.

Something else.

Than you for the quick response. Yeah, i update again version and checked the code and looks that the robot_state_publisher is not the problem here. I think the problem is related with the TF2 library as you said because looking the TFs published in the topic, they are beeing republished but with a static time stamp.

The tfs that should be published has a correct timestamp but the ones that were removed from the URDF are beeing republished with the last time stamp calculated (before changing the URDF).

clalancette commented 2 years ago

The tfs that should be published has a correct timestamp but the ones that were removed from the URDF are beeing republished with the last time stamp calculated (before changing the URDF).

OK, that's good to know. Are these old transforms being published on /tf or /tf_static?

Dagu12s commented 2 years ago

Yes, at least for the static ones, the transforms that were removed from the URDF are being published in /tf_static with the time stamp from the last time they were calculated

clalancette commented 2 years ago

Thanks for the testing.

Here's the problem: https://github.com/ros2/geometry2/blob/f72c9d31a95b1863d7fd0fdbac77102b674f3317/tf2_ros/src/static_transform_broadcaster.cpp#L52-L72 . In particular, notice that all transforms in the incoming message are added to a class variable called net_message_, and then the entire net_message_ is published. That means this can only ever be additive.

I guess the thinking here is that all static transforms that are known by this node are published all at once, and no duplicates are sent. And that makes some sense with the current QoS being set to transient local with a depth of 1.

There's two ways we could fix this:

Don't store netmessages, and instead just publish what is being passed in.
Add a clearTransforms method to StaticTransformBroadcaster.

The first option runs the risk of changing the semantics, since some programs may be relying on the "additive" behavior of StaticTransformBroadcaster (robot_state_publisher doesn't rely on it, but others might). The second option is a more "opt-in" option, so robot_state_publisher could just opt in to this new behavior.

So with all of that said, see https://github.com/ros2/geometry2/pull/523 and https://github.com/ros/robot_state_publisher/pull/197 , which should fix this particular problem. Note that those PRs are still in draft, and will remain so until I get confirmation that they actually fix the issue.

Dagu12s commented 2 years ago

I don't know if this makes sense but a cleaner/filter that removes from the net_message all the tf's that are old, maybe the ones that have not being calculated in the last few seconds or something similar. Maybe this won`t be a good idea if the robot has a very big tf tree

clalancette commented 2 years ago

I don't know if this makes sense but a cleaner/filter that removes from the net_message all the tf's that are old, maybe the ones that have not being calculated in the last few seconds or something similar. Maybe this won`t be a good idea if the robot has a very big tf tree

The issue is that for /tf_static topics, most of them will be old. That's because we only publish them once, and they may exist for the lifetime of the robot. So I'm not sure how we would know to age them out.

That said, another way we could do this would be to add a new method to StaticTransformBroadcaster to remove an individual link. Then robot_state_publisher could iterate over its static transforms, and remove them one-by-one. That's a more generic solution, but also unnecessary for this use case, since robot_state_publisher is going to re-add everything anyway.

ros / robot_state_publisher

[ROS 2] Separate robot description publishing from robot state publishing #144