Closed SteveMacenski closed 3 weeks ago
There's a weird issue where the docking smoke system test is failing by timing out with no logging in CI but I cannot reproduce locally. Working on it... sorry for testing in CI
How do you keep your ros2 launch-based tests from conflicting with each other? (like, this test publishes and subscribes stuff - is it maybe conflicting with some other test that is running? Is there some magic in ros2 launch file tests that magically assigns different ROS_DOMAIN_IDs? I haven't built a huge set of ros2 launch tests yet to see what issues arise...
How do you keep your ros2 launch-based tests from conflicting with each other? (like, this test publishes and subscribes stuff - is it maybe conflicting with some other test that is running? Is there some magic in ros2 launch file tests that magically assigns different ROS_DOMAIN_IDs? I haven't built a huge set of ros2 launch tests yet to see what issues arise...
The tests are run sequentially within a given package, but could be run in parallel w.r.t. other packages. In our CI settings, we have parallel-workers
set to 1 (https://github.com/ros-navigation/navigation2/blob/main/.circleci/defaults.yaml#L11-L14) so that only one package is tested at a time making sure they don't conflict.
By the way, I found at least the first cause of the issue, the while loops are never exiting because the action_result
s aren't being populated for some reason, which implies that the result_future.add_done_callback(self.action_result_callback)
or future.add_done_callback(self.action_goal_callback)
are never called. Its really hard to tell which since CI isn't giving me back any logs so I'm kind of stabbing in the dark narrowing in on it:
while len(self.action_result) == 0:
rclpy.spin_once(self.node, timeout_sec=0.1)
time.sleep(0.1)
I still cannot reproduce this locally but is apparently happening in Rolling on 24.04 or something peculiar in CI. I assumed at the start that may have been the issue (since it timed out rather than crashed or something), but I made sure it wasn't a regression in launch testing first - which are some of the fumbling commits you see above. I'm pretty sure now its either something we're doing or rclpy
's spinning / action lib is busted (which I wouldn't put behind it since we did find some issues with Jazzy already with spinning repeating needed to make work the first time and historically we haven't relied on the result/accepted callbacks because they weren't reliable)
@mikeferguson in case you're curious about the outcome, basically I just swapped out all the done callbacks for the goal / result with spinning until complete and that resolved it. There seems to be some problem in spinning in Jazzy/Rolling I think, this isn't the first time I've seen something like this since working on the 24.04 migration. Putting all the action work inline and blocking seems to resolve the issue instead of using callback functions. Ex.
future = self.undock_action_client.send_goal_async(goal)
rclpy.spin_until_future_complete(self.node, future)
self.goal_handle = future.result()
assert self.goal_handle.accepted
result_future = self.goal_handle.get_result_async()
rclpy.spin_until_future_complete(self.node, result_future)
self.action_result.append(result_future.result())
Now it all works as expected https://github.com/ros-navigation/navigation2/pull/4406/commits/95c547bb6e52c75eca270357c359f1a9e976ee7e
Thanks to NVIDIA for their support to make this possible!
Original location: https://github.com/open-navigation/opennav_docking Tutorials: https://docs.nav2.org/tutorials/docs/using_docking.html Configuration: https://docs.nav2.org/configuration/packages/configuring-docking-server.html Commander API: https://docs.nav2.org/commander_api/index.html
A few known enhancements that would be great to have over time: https://github.com/ros-navigation/navigation2/issues/4405 https://github.com/ros-navigation/navigation2/issues/4404 https://github.com/ros-navigation/navigation2/issues/4403