Closed dkuenster closed 4 years ago
Is this the same rviz window or a new one on a new navigation launch? Can you verify that if you toggle the rviz display types for the costmap (or relaunch rviz) that it appears? I think what you're seeing has nothing to do with navigation but rather a failure in the visualization tools.
Even 0s would show up here with a boundary because of the changes in transparency between the 2 costmap settings. I think you would see 0s on the costmap in your pictures if the costmap were actually being shown.
Not sure it relates, but Buster is also not a Tier 1 supported OS so it may be that the DDS vendors / RMW layers don't do detection properly on that or something. Not sure its related, but could certainly be.
I was 4/4 in launching them just now - so might want to take a second look and make sure what's happening is what you think is happening.
Not sure it relates, but Buster is also not a Tier 1 supported OS
I face this issue too on Ubuntu 18.04 so it shouldn't be related to Buster.
I looked into this a bit more, the local costmap is set to all 0s for some reason.
As @dkuenster said, this happens in more than 50% of the times when starting up the simulation. When this happens, only the static layer seems to be working for the global costmap as well. The local costmap also has non-zero values if a static layer is added to it and it shows up every time.
I'm not completely sure what the error with the rest of the layers is exactly, trying to look into it.
Your image makes it seem like the laser is up and running given that I see some red in the center pole that is off the map (from robot localization quality and laserscans). But I get your point if that happens but this isn't a good example of that.
Let me know what you find out.
Is this the same rviz window or a new one on a new navigation launch? Can you verify that if you toggle the rviz display types for the costmap (or relaunch rviz) that it appears? I think what you're seeing has nothing to do with navigation but rather a failure in the visualization tools.
It was a new window with a new navigation launch. When switching the visualization, I get the same result that can be seen in the screenshot of @naiveHobo.
I also found that each time the Local Costmap doesn't appear, the Controller only gets 0 as initial velocity in the twist message of the computeVelocityCommand, despite the robot moving and the odom topic containing the correct velocities. On a start where the Local Costmap starts correctly on the other hand the actual current velocity gets passed to the controller.
The pose parameter however works correctly in both cases.
While echo constantly shows msgs on the "odom" topic in both cases, the OdomSubscriber in the Controller gets messages on some starts and on others the callback method never gets called. Each time it doesn't get messages, we also get the problem with the local costmap plugins, as soon as we set the initial pose. I don't know how it is related, but something seems to go wrong before we even set an intial pose.
While echo constantly shows msgs on the "odom" topic in both cases, the OdomSubscriber in the Controller gets messages on some starts and on others the callback method never gets called. Each time it doesn't get messages, we also get the problem with the local costmap plugins, as soon as we set the initial pose. I don't know how it is related, but something seems to go wrong before we even set an intial pose.
Same problem with the LaserScanSubscriber in the Obstacle Layer. On the starts where the OdomSubscriber callback never gets called, the callback in the LaserScan subscriber also doesn't get called despite echo showing messages on "scan".
Just to verify, what you're describing are specific instances of topics that are being published that have not yet connected to the costmaps, correct?
Can you try seeing if switching DDS vendors to Cyclone DDS resolves those issues? I'm wondering if there was a regression or an issue with the local discovery with Fast-RTPS. What version of ROS2 are you on right now (eloquent, master, foxy, etc)
Just to verify, what you're describing are specific instances of topics that are being published that have not yet connected to the costmaps, correct?
Yes.
Can you try seeing if switching DDS vendors to Cyclone DDS resolves those issues? I'm wondering if there was a regression or an issue with the local discovery with Fast-RTPS. What version of ROS2 are you on right now (eloquent, master, foxy, etc)
Switching to Cyclone DDS indeed solves this problem. Also switching back to version v1.10.0 of Fast-RTPS, as suggested in #1788 solves the problem. So it seems to be an issue introduced in newer versions of Fast-RTPS.
Ah ok, yeah that appears to be the same issue at #1788 and https://github.com/ros2/ros2/issues/931. Can you quickly verify that the commit https://github.com/eProsima/Fast-DDS/commit/a9bd1a9003adb7ca80c0f6854de58e181059de94 is the offender? If so, we can merge these 2 tickets together and track them.
Yes, it works right until commit https://github.com/eProsima/Fast-DDS/commit/d5c9d6bcd4fdfe7edadb137c6203a2db8d01154f (the commit right before https://github.com/eProsima/Fast-DDS/commit/a9bd1a9003adb7ca80c0f6854de58e181059de94) and then breaks on https://github.com/eProsima/Fast-DDS/commit/a9bd1a9003adb7ca80c0f6854de58e181059de94
I'm rolling in the scope of https://github.com/ros-planning/navigation2/issues/1788 into this one so we have 1 ticket per issue and renaming this issue to Fast-RTPS services and network discovery regression
. We should track that upstream issue but also potentially move to Cyclone DDS for development since that doesn't exhibit the issue.
I checked this using commit 69977cd83d9040df3422d8a2e564715b6002f3fb + current ros2 master, and running several experiments. For each experiment I followed this procedure:
RMW_IMPLEMENTATION=<impl> ros2 launch nav2_bringup tb3_simulation_launch.py 2>&1 | tee console.txt
As I work with Windows, I ran the experiments using VirtualBox to run Ubuntu Focal on a virtual machine.
I have checked with rmw_cyclonedds_cpp and rmw_fastrtps_cpp. For the latter, I have checked with eProsima/Fast-DDS@b710b1f53a4ecf6c92f87661347a93c46e5f4854 (current head of 2.0.x branch) as long as with eProsima/Fast-DDS@d5c9d6bcd4fdfe7edadb137c6203a2db8d01154f
I have never been able to see the expected image. Some times rviz crashed. Other times I could correctly navigate, but the local costmap was not shown. A summary of the results so far...
ROS 2 repos file | rmw implementation | result | result files |
---|---|---|---|
master | rmw_cyclonedds_cpp | rviz crashed after step 4 | here |
master | rmw_cyclonedds_cpp | navigation complete. local costmap not shown | here |
master | rmw_fastrtps_cpp | rviz crashed after step 4 | here |
master | rmw_fastrtps_cpp | navigation complete. local costmap not shown | here |
Fast-DDS-d5c9d6bcd | rmw_fastrtps_cpp | navigation complete. local costmap not shown | here |
Fast-DDS-d5c9d6bcd | rmw_fastrtps_cpp | navigation complete. local costmap not shown | here |
My impression is that now that both implementations have workarounds to make services more reliable, this issue is always reproduced, so maybe there is something wrong in navigation2 that is now reproducibly failing.
NB: It would be nice if someone could check this with RTI connext
[rviz2-4] what(): InternalErrorException: Cannot create GL vertex buffer in GLHardwareVertexBuffer::GLHardwareVertexBuffer at /home/miguel/ros2_master/build/rviz_ogre_vendor/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLHardwareVertexBuffer.cpp (line 46)
For rviz crashing, I can't help you on that unless its a result of the navigation2 plugins, but I don't think that's the case. If you run with debug symbols and its our fault, I'll look into it, but I think that's rviz.
Keep in mind its not just about the costmap showing up, the issue we're talking about is services, which those experiments don't do anything to measure. Services can be trivially tested without the navigation stack with some simple call-response nodes.
@daisukes thoughts? I'm not read up or tracking fast-rtps commits so those hashes or the specific changes don't mean much to me (I'm an expert in robotics, not DDS/networking). Have you reproduced the service problem at all from the reports? That's the best starting point that I have also experienced and we still see in the navigation2 CI. Once you've reproduced the problem, I think that's more clear to show that those changes actually fixes the underlying problem.
@SteveMacenski
As I investigated the commits of Fast-DDS, it worked fine until this commit. I tested with this simple service test code https://github.com/ros2/ros2/issues/931#issuecomment-639489955
terminal 1 $ ros2 launch nav2_bringup tb3_simulation_launch.py # and give an initial position
terminal 2 $ ros2 run service_test service_test
RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
[INFO] [1594423870.718509112] [rclcpp]: 0 Successed
[INFO] [1594423872.919848288] [rclcpp]: 0 Successed
[INFO] [1594423874.913237309] [rclcpp]: 0 Successed
...
unset RMW_IMPLEMENTATION (default Fast-RTPS)
[INFO] [1594423963.004730778] [rclcpp]: 0 service not available.
[INFO] [1594423968.228786391] [rclcpp]: 0 service not available.
[ERROR] [1594423974.496116010] [rclcpp]: 0 Failed
[INFO] [1594423979.727908774] [rclcpp]: 0 service not available
...
We also had rviz2 crash if we use the latest binary (after June 25th), so we use the source build with rviz2 v8.1.1 not v8.2.0. I'm not sure if it is v8.2.0 problem or binary problem. https://github.com/ros2/ros2/commit/fc010c9a297eceaedb398213dea14d5ad5d67844#diff-215a2eb6c7ad8b20796a9fceb48f8cc7
Can you file a ticket if one doesnt exist on rviz2 for that? Make sure someone knows there's a problem
Thanks for the experiment and specification. That will definitely help clear things up.
FYI: I made a ticket https://github.com/ros2/rviz/issues/574
@SteveMacenski @daisukes It seems we found the issue. Could you give a try to eProsima/Fast-DDS#1295 ?
@MiguelCompany I have built the branch and confirmed that the service_test works well and also my own simulation works well with RMW_IMPLEMENTATION=rmw_fastrtps_cpp
. Thank you!
@SteveMacenski As eProsima/Fast-DDS#1295 has been merged, and @daisukes checked correct behavior, I think this issue can be closed?
@MiguelCompany has it been released into foxy?
@MiguelCompany has it been released into foxy?
I don't think so, but I think we should ask @jacobperron about it.
@naiveHobo there's been a foxy sync so this might be OK now
Fast-DDS 2.0.0 is currently version in Foxy. Once a 2.0.1 tag exists, we can make a new release containing eProsima/Fast-DDS#1295.
@jacobperron v2.0.1
has been released, please go ahead 😉
@SteveMacenski @daisukes v2.0.1
has long ago been released into foxy. This and related issues should have been solved.
I confirmed its been released now - closing.
Bug report
Required Info:
Steps to reproduce issue
use tb3_simulation_launch.py to start the gazebo simulation and nav2 stack. Then use "2D Pose Estimate to localize the robot".
Expected behavior
Local Costmap should show every time, e.g:
Actual behavior
In more than 50% of all initializations the Local Costmap doesn't show, eg:
Echoing /local_costmap/costmap shows the costmap constains all zeroes despite being in the same position as in the working case, where it contains actual values. Rviz doesn't report any issues with the topics.
Additional information
console_output_empty_costmap.txt console_output_working_costmap.txt
I can't find any differences or errors in the console output. Can anyone reproduce the issue or has any idea what is happening?