Explicitly ensure that all ROS messages are sent after a step

rhaschke commented 1 year ago

@lbergmann1, please check whether this improves/resolves your ROS message issue. @DavidPL1, you mentioned a segfault when introducing this, right? I didn't observe anything so far.

However, I was sure I could remove the 100ms wait time on reset. But I can't. Actually, debugging showed that no ROS messages are processed by the processAllRosMessages() calls there. I think the reason is that an old joint_states message needs to "arrive" at the remote robot_state_publisher, which then sends back (old) TF messages. And this simply takes time, we cannot influence.

DavidPL1 commented 1 year ago

@DavidPL1, you mentioned a segfault when introducing this, right? I didn't observe anything so far. I only had a naive implementation and probably had a stupid mistake that I had no time to debug.

Your implementation looks good, I only have one remark: should we not remove this AsyncSpinner here? https://github.com/ubi-agni/mujoco_ros_pkgs/blob/819b5082b8b143e76d8e78121bd2009c846f6479/mujoco_ros/src/main.cpp#L43-L44

However, I was sure I could remove the 100ms wait time on reset. But I can't. Actually, debugging showed that no ROS messages are processed by the processAllRosMessages() calls there.

I expected as much. Since the reset is handled in the event loop where sim_mtx is locked, no new message should be created and the spinner threads should have enough time to process all outgoing messages until the actual reset is issued.

I think the reason is that an old joint_states message needs to "arrive" at the remote robot_state_publisher, which then sends back (old) TF messages. And this simply takes time, we cannot influence.

I agree. The time it takes is also heavily influenced by the hardware the sim runs on and might still happen with the current timeout of 0.1 seconds. We could make this timeout configurable with a rosparam.

We could also listen for TF messages "from the future" and trigger a reset automatically, but I don't know if this is a practical solution.

rhaschke commented 1 year ago

Should we not remove this AsyncSpinner here?

I tried, but this didn't work (deadlock). Probably, because ROS time is not correctly published anymore.

We could also listen for TF messages "from the future" and trigger a reset automatically

That's what I implemented in a TF pull request. But it was kind of rejected - being too brittle.

lbergmann1 commented 1 year ago

Using this implementation doesn't really change anything for me. But calling rospy.rostime.wallsleep(0.1) after performing simulation steps seems to solve this issue.

ubi-agni / mujoco_ros_pkgs

Explicitly ensure that all ROS messages are sent after a step #15