iRobotEducation / create3_examples

Example nodes to drive the iRobot® Create® 3 Educational Robot
BSD 3-Clause "New" or "Revised" License
51 stars 12 forks source link

Create3 Lidar Not Working #44

Closed bluepra closed 11 months ago

bluepra commented 1 year ago

Trying to follow this demo: https://github.com/iRobotEducation/create3_examples/tree/humble/create3_lidar

Setup

Roomba:

FastRTPS Middleware H.1.0 Firmware

Raspberry Pi

Raspberry Pi 4 Ubuntu 22.04 Desktop ROS2 Humble installed .bashrc is setup for FastRTPS Pi and Roomba communicate through Ethernet over USB

VM on my MacBook

Ubuntu 22.04 Desktop ROS2 Humble installed .bashrc is setup for FastRTPS

Lidar connected to Pi

https://user-images.githubusercontent.com/71447892/227058924-a07a73c1-a43b-4d3d-9344-819e47bef95c.MOV

DF22FE30-D4EB-445B-BAF2-90F4859A636F

I worked with Professor Briana Bouchard to get the 3 devices to communicate via the wireless network, so I don't believe the problem is there. The roomba, pi, and my VM are all connected to the same wifi network. They are all using FastRTPS too. Lastly, we did disable Multicasting by following the FastDDS instructions on https://iroboteducation.github.io/create3_docs/setup/xml-config/#fast-dds.

From my VM, I can see the Roomba's topics when I do ros2 topic list and I can access the Roomba's webserver from my VM's browser.

Running the nodes

On my pi:

On my VM:

Not sure where the issue is. My devices can all communicate with each other, but somewhere the Lidar's readings are not being captured or sent the way they should be.

Any help is appreciated.

carlsondc-ceva commented 1 year ago

A few things to check:

  1. I don't see the create3's odometry node connected to the /tf topic, so I think that something in the create3 <-> host PC connection is not working correctly. Where did you run rqt_graph? If you run it on the same machine that is running slam_toolbox_launch.py, you should see one of the C3's nodes publishing transforms.
  2. Make sure that your transform tree is connected: ros2 run tf2_tools view_frames.py The resulting .pdf should show you a tree that includes a path from laser_frame to the odom.
  3. If the timestamps from the C3's publications are sufficiently out of step with the timestamps on the machine generating lidar output, then slam_toolbox will not be able to properly map the lidar samples, so it's worth doing a sanity check on that (e.g. from the pi do wget http://192.168.186.2/logs-raw and verify that it's doing NTP stuff and the log timestamps are in the right ballpark.
carlsondc-ceva commented 1 year ago

btw: if you add the user account that is running the lidar sensors to the dialout group, you should avoid that permission problem. I made some notes here https://github.com/iRobotEducation/create3_examples/issues/42 that describe how to set up udev rules that will ensure the lidar is always available at a fixed filesystem location

bluepra commented 1 year ago

To address the 3 points from @carlsondc-ceva

  1. I ran the original rqt_graph on my VM. Here is the rqt_graph on my pi: Screenshot from 2023-03-23 11-24-29
  2. When I run ros2 run tf2_tools view_frames.py I get a message saying "No executable found"
  3. The logs show a latest timestamp of Mar 23 16:28:22 which I think is UTC time. My Pi and VM are on CST time Mar 23 11:28
bluepra commented 1 year ago

I tried following https://iroboteducation.github.io/create3_docs/setup/compute-ntp/ exactly, but when I do sudo chronyc clients I only see

Hostname                      NTP   Drop Int IntL Last     Cmd   Drop Int  Last
===============================================================================

I added this to my /etc/chrony/chrony.conf:

server 192.168.186.2 presend 0 minpoll 0 maxpoll 0 iburst  prefer trust
# Enable serving time to ntp clients on 192.168.186.0 subnet.
allow 192.168.186.0/24
carlsondc-ceva commented 1 year ago
  1. When I run ros2 run tf2_tools view_frames.py I get a message saying "No executable found"

Run sudo apt install ros-humble-tf2-tools on the machine that is running the slam_toolbox node, then try it again. The first goal is to prove to ourselves that all of the necessary topics and transforms are visible to the slam_toolbox. If they are all there, then the next step is to check out timestamp integrity.

Can you clarify what nodes are running on what devices? I'm confused by why the rqt_graph output above shows e.g. the /scan topic coming into the /rviz node, but not the RPLidar node that is producing it.

bluepra commented 1 year ago

I will do the sudo apt install ros-humble-tf2-tools and get back on that.

In the meantime, to answer your question regarding clarifying what nodes are running on what devices:

I am following this guide exactly: https://github.com/iRobotEducation/create3_examples/tree/humble/create3_lidar

My SBC is my Raspberry Pi, and my secondary computer for rviz is my virtual machine on my Mac.

carlsondc-ceva commented 1 year ago

ok, cool. and on the pi, you are able to see all of the publications from the create3, right? e.g. if you do ros2 topic list you see the wheels, IMU, etc?

bluepra commented 1 year ago

ok, cool. and on the pi, you are able to see all of the publications from the create3, right? e.g. if you do ros2 topic list you see the wheels, IMU, etc?

yes!

bluepra commented 1 year ago

When I do a ros2 topic list I see this

/battery_state
/cmd_audio
/cmd_lightring
/cmd_vel
/dock_status
/hazard_detection
/imu
/interface_buttons
/ir_intensity
/ir_opcode
/kidnap_status
/mobility_monitor/transition_event
/mouse
/odom
/parameter_events
/robot_state/transition_event
/rosout
/slip_status
/static_transform/transition_event
/stop_status
/tf
/tf_static
/wheel_status
/wheel_ticks
/wheel_vels

I ran sudo apt install ros-humble-tf2-tools and the last line of the output was 0 upgraded, 0 newly installed, 0 to remove and 94 not upgraded, so maybe I already had that installed.

Running ros2 run tf2_tools view_frames.py still gives me No executable found

bluepra commented 1 year ago

So i tried running ros2 run tf2_tools view_frames instead of ros2 run tf2_tools view_frames.py on the Pi and I got this an output:

[INFO] [1679698865.995797402] [view_frames]: Listening to tf data for 5.0 seconds...
[INFO] [1679698871.049957789] [view_frames]: Generating graph in frames.pdf file...
[INFO] [1679698871.060249007] [view_frames]: Result:tf2_msgs.srv.FrameGraph_Response(frame_yaml='[]')

And I have attached the pdf file too. frames_2023-03-24_18.01.11.pdf

Side note:
both the ros2 launch create3_lidar sensors_launch.py and ros2 launch create3_lidar slam_toolbox_launch.py were running simultaneously on the Pi when I ran the ros2 run tf2_tools view_frames

Before I run rviz2 on my VM, the rqt_graph on the pi looks like this: Screenshot from 2023-03-24 18-10-19

After I run rviz2 on my VM, the rqt_graph on the pi looks like this: Screenshot from 2023-03-24 18-10-53

carlsondc-ceva commented 1 year ago

Hello. You should definitely be able to see some transforms, so that is likely the source of your problem. One note, I am running ROS2 galactic, not humble. I haven't tried on humble, though I don't think it should be too different.

This zip file working_example.zip has a brief ros bag capture (example.bag) which you can compare against. The transform graph should look like something like this:

image

The output from running

for node in $(ros2 node list)
do
  ros2 node info $node
done

is in that .zip file as node_info.txt and should tell you how all of the nodes/topics need to be connected. I don't have a great setup to do a screenshot of rqt_graph output at the moment.

If that doesn't provide you with enough information, then you might want to get directly in touch with @alsora or whoever the designated point-of-contact is within iRobot for this sort of thing.

bluepra commented 1 year ago

@carlsondc-ceva when should I run

for node in $(ros2 node list)
do
  ros2 node info $node
done

Should I run it after running ros2 launch create3_lidar sensors_launch.py and ros2 launch create3_lidar slam_toolbox_launch.py on the Pi? What about running rviz2 on my VM?

carlsondc-ceva commented 1 year ago

All of these diagnostics should be run while you've got the sensors_launch and slam_toolbox_launch scripts running, the goal here is to determine whether all of the various data sources are talking to each other. This means that the lidar node should be publishing laser scans to some topic, and the SLAM node should be subscribed to that topic. The Create3 should be publishing a bunch of different transforms to the /tf topic, and the relationship between those transforms should provide a way to translate laser scans to the 'odom' frame. The SLAM node then is responsible for figuring out the transform between the 'odom' frame and a 'map' frame that makes sense given the laser scans.

I'll break down the command above:

I don't know about rviz2, based on what you shared above with the empty transform graph I think it makes more sense to establish whether the nodes are all exchanging the information they need before worrying about the visualization.

bluepra commented 1 year ago

Sounds good! I ran the sensors_lauch and slam_toolbox_launch on the Pi, and then ran:

for node in $(ros2 node list)
do
  ros2 node info $node
done

and I got this output: my_node_info.txt

Which is a lot shorter than what you had. I compared your output with mine and here is the diff: https://www.diffchecker.com/oDp2pL4F/

bluepra commented 1 year ago

Even stranger, after I run the sensors_launch and slam_toolbox_launch on the pi, I did a ros2 node list and got:

/_internal/composite_hazard
/_internal/kinematics_engine
/_internal/mobility
/_internal/stasis
/mobility_monitor
/motion_control
/robot_state
/static_transform
/system_health
/ui_mgr

Shouldn't I see /rplidar_node and /slam_toolbox and others??

carlsondc-ceva commented 1 year ago

Yes. The sensor launch should result in /rplidar_node and /static_transform_publisher_XXX. The slam launch should result in /slam_toolbox and /transform_listener_impl_XXXX .

This seems like something in your RMW is not configured correctly, those nodes should definitely be visible to another process on the pi. Are you using the same ROS_DOMAIN_ID in all of these terminal sessions? What you've described sounds kind of like the slam_toolbox node was able to get the scans from the /rplidar_node, but not the transforms that the create 3's /robot_state node is publishing to the /tf topic.

I did run into a weird issue where I tried to statically configure the <initialPeersList> section of my fastRTPS configuration file and it stopped discovering local nodes reliably. Due to other factors, I no longer needed to do this so I just use the default configuration (do not specify anything for the FASTRTPS_DEFAULT_PROFILES_FILE env variable) and it has worked fine since then.

bluepra commented 1 year ago

This is what my my file for FASTRTPS_DEFAULT_PROFILES_FILE looks like:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
   <participant profile_name="unicast_connection" is_default_profile="true">
       <rtps>
           <builtin>
               <metatrafficUnicastLocatorList>
                   <locator/>
               </metatrafficUnicastLocatorList>
               <initialPeersList>
                   <locator>
                       <udpv4>
                           <address>CREATE3_ROBOT_IP_ADDRESS</address>
                       </udpv4>
           </locator>
           <locator>
                       <udpv4>
                           <address>MY_VM_IP_ADDRESS</address>
                       </udpv4>
                   </locator>
               </initialPeersList>
           </builtin>
       </rtps>
   </participant>
</profiles>

Are you saying to get rid of the FASTRTPS_DEFAULT_PROFILES_FILE from the .bashrc?

My ROS_DOMAIN_ID is set to 0 (default) when I check the Roomba's web server. I tried setting the ROS_DOMAIN_ID on my pi to be 0 by adding export ROS_DOMAIN_ID=0 to the .bashrc, but that still doesn't show me more nodes than before.

carlsondc-ceva commented 1 year ago

yeah, try to omit the FASTRTPS_DEFAULT_PROFILES_FILE specification. Comment that line out in your .bashrc file, then log out/back in and try again.

I've got a pretty decent handle on ROS2 itself, but the RMW config is black magic to me.

bluepra commented 1 year ago

Tried it, same result. ros2 node list gives this:

/_internal/composite_hazard
/_internal/kinematics_engine
/_internal/mobility
/_internal/stasis
/mobility_monitor
/motion_control
/robot_state
/static_transform
/system_health
/ui_mgr

My slam_toolbox_launch never outputs [async_slam_toolbox_node-1] Registering sensor: [Custom Described Lidar] like the demo says it should.

carlsondc-ceva commented 1 year ago

try a ros2 daemon stop and/or restart the pi, and then go for it again? I am running out of ideas here.

bluepra commented 1 year ago

Ok I restarted my Pi and did ros2 daemon stop. I also had removed the FASTRTPS_DEFAULT_PROFILES_FILE from my .bashrc.

On the pi I ran sensors_launch and slam_toolbox_launch and then finally I did ros2 node list and I got:

/_internal/composite_hazard
/_internal/kinematics_engine
/_internal/mobility
/_internal/stasis
/mobility_monitor
/motion_control
/robot_state
/rplidar_node
/rviz
/slam_toolbox
/static_transform
/static_transform_publisher_jnq4FVKTr16ve5EV
/system_health
/transform_listener_impl_aaaaddb22b20
/transform_listener_impl_aaab0e606b80
/ui_mgr

I also did

for node in $(ros2 node list)
do
  ros2 node info $node
done

and I got: my_node_info_2.txt

Here is the rqt_graph on my pi: Screenshot from 2023-03-28 14-47-09

lastly, my slam_toolbox_launch kept saying:

[async_slam_toolbox_node-1] [INFO] [1680032751.658372302] [slam_toolbox]: Message Filter dropping message: frame 'laser_frame' at time 1680032751.510 for reason 'discarding message because the queue is full'
[async_slam_toolbox_node-1] [INFO] [1680032751.821140257] [slam_toolbox]: Message Filter dropping message: frame 'laser_frame' at time 1680032751.657 for reason 'discarding message because the queue is full'

over and over

Hope all this info helps narrow it down

carlsondc-ceva commented 1 year ago
  1. What is the ros2 run tf2_tools view_frames.py output?
  2. I am deeply suspicious of the fact that the rqt_graph output doesn't show any of the create3 nodes or topics. Specifically, there is no /robot_state node here that is publishing to the /tf topic.

The message that you're seeing is because there is a buffer that is holding onto laser scans and waiting to get enough information to map them from the laser_frame frame to the odom frame. This information comes from /robot_state via the /tf topic, and based on the rqt_graph output, I doubt that it's getting there.

So I don't know what conditions allow you to see the nodes when you do ros2 node list on the pi but not have them show up when you run rqt_graph on the pi: solve that problem and you will probably have a working system

bluepra commented 1 year ago

Here is the output to ros2 run tf2_tools view_frames frames_2023-03-28_15.13.51.pdf

In my rqt_graph, should hide or select certain things?

bluepra commented 1 year ago

I hit the Refresh ROS graph button on rqt_graph and I got this: rosgraph

carlsondc-ceva commented 1 year ago

See the frames output from above under working conditions: there is no path in your transform graph from laser_frame to odom, so SLAM can't work with this.

image

The proximal problem is that the transforms being published by /robot_state are not getting where they have to go. I don't know why that is, and I am out of ideas. You could try running ros2 run tf2_tools view_frames when you are not running the slam_toolbox / sensor nodes (just the basic create3 stuff) and see if they show up.

The rqt_graph output you just shared at least has all of the nodes present and I see a connection from /robot_state to /tf and I see the transform listener, so that looks reasonably promising. If there are synchronization problems between the laser scan timestamps and the transform timestamps, then you'll also get the messages from slam_toolbox about dropping messages.

bluepra commented 1 year ago

Ok I will try ros2 run tf2_tools view_frames when I'm not running the slam_toolbox / sensor nodes.

In the meanwhile, if someone from iRobot can help me out, I would really appreciate that!

carlsondc-ceva commented 1 year ago

Any progress here? I'm fairly new with ROS2 and would love to pick up any tricks you've figured out.

bluepra commented 1 year ago

No real progress. My tf tree does not look the way it is supposed to. If you have any ideas, I am all ears!

bluepra commented 1 year ago

So I did a ros2 topic echo /map and got this:

---
header:
  stamp:
    sec: 1681158936
    nanosec: 852636971
  frame_id: map
info:
  map_load_time:
    sec: 0
    nanosec: 0
  resolution: 0.05000000074505806
  width: 93
  height: 107
  origin:
    position:
      x: -4.0040477452931045
      y: -3.352724186639207
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
data:
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- 100
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- -1
- '...'
---

Does this provide any clues as to what is wrong?

carlsondc-ceva commented 1 year ago

If you try to visualize that map in rviz, what do you see? If the SLAM node is outputting any sort of map, then that is a good sign! It won't do that unless it has laser scans and the transforms necessary to connect the odom frame to laser_frame frame. The message contains an occupancy grid, so it looks like there's a bunch of "unknown" cells and at least one "occupied" cell.

bluepra commented 1 year ago

Unfortunately, my rviz (running on my VM) gives me a message saying that the map does not exist:

227058668-6eefd77d-73a6-471b-a5fd-09e8f9e4685c
ChrisBove commented 11 months ago

I'm hoping people figured this out already, but if not, I ran into the same problem and the issue was indeed a mismatch of timestamps on the raspi. From a vanilla Ubuntu Raspi Server install, there aren't any NTP servers defined for synchronization, so the timestamps on the raspi were something like Mar 2023 for me, but the robot timestamps were present day.

Fix was to edit /etc/systemd/timesyncd.conf to have something like:

[Time]
NTP=ntp.ubuntu.com
FallbackNTP=0.us.pool.ntp.org 1.us.pool.ntp.org

Then reboot or systemctl restart systemd-timesyncd.service and check systemctl status systemd-timesyncd.service until it syncs after a few minutes.

Demo worked great on Galactic after that. There might be more legit ways to address this - I just opened iRobotEducation/create3_docs/pull/428 to include these steps in the setup documentation.

carlsondc-ceva commented 11 months ago

Good info. In my particular test environment, the wifi network for "untrusted devices" not fully controlled by our IT group has no internet access, so my approach for time synchronization is to have the raspberry pi act as the NTP server to the Create3 over the USB connection (and I manually set the clock on the pi when I turn it on). The directions here worked.

shamlian commented 11 months ago

@ChrisBove This is interesting; I now see that you're right that this file is unconfigured, but my experience has been that my Raspberry Pi receives a time sync anyway. I wonder if the network infrastructure matters (that is, maybe my network sends time server info over DHCP and yours does not)? I am leaning towards making this step optional instead of required, or suggested in the case that the user is having trouble with time sync. Is it worth having the user try running timedatectl and only configure this file if necessary?

ChrisBove commented 11 months ago

Ah, I forgot those directions when I brought the raspi up. I just tried that now and didn't work out for me as written - I needed to modify the robot's NTPD config file to get it to sync with the raspi. I'll add that to my PR for discussion there, and I'll adjust my PR to remind people to do the NTP configuration steps (since that at least syncs the time between the Create3 and compute board).

@shamlian I think you're right - I did notice in timedatectl logs that it was trying to connect to my router for NTP (and failing), so it's possible some routers do that while others don't. It would be best to make this an optional step - I'll make that change.