turtlebot / turtlebot4

Turtlebot4 common packages.
Apache License 2.0
96 stars 43 forks source link

mapping/RVIZ not work on PC talking to tb4 -> topics missing from PC -> time sync problem -> best solution? #363

Closed yeff-fresh closed 6 months ago

yeff-fresh commented 6 months ago

(ETA: Added a summary at the end of this original description of the Issue with what the various problems were, and how they were solved.)

Robot Model

Turtlebot4 Standard

ROS distro

Humble

Networking Configuration

Simple Discovery

OS

Ubuntu 22.04

Built from source or installed?

Installed

Package version

Create 3 is using firmware version H.2.4 (updated through web server) from Create 3 Releases (https://iroboteducation.github.io/create3_docs/releases/overview/) Raspberry Pi 4 is using code image Humble 1.0.0 from TurtleBot4 Humble page (https://turtlebot.github.io/turtlebot4-user-manual/changelogs/humble.html) I have done sudo apt update && sudo apt upgrade on the RaspPi4 PC is running Ubuntu 22.04 and ROS2 Humble

Type of issue

Navigation (SLAM, Nav2 etc.)

Expected behaviour

I can run RVIZ and localization and mapping on my PC and see a clean map in RVIZ with the localization and navigation indicators and can see all topics and messages on my PC (and even generate a map as per UserManual -> TUTORIALS -> 4. Generating a map)

Actual behaviour

RVIZ on PC displays error that Map frame is missing /odom topic not on PC (and others also missing), but it is on RaspPi4 /tf does not comtain odom->base_frame transform (other transforms missing too) but it is on RaspPi4

In debugging, went down the road of checking ntp time sync and did get Create3 and RaspPi4 sync'd with correct time (same as PC). But setup is non-persistent and will not work if TB4 not on internet-connected network (of course), and still don't see topics even if everyone is on same internet-connected network.

Error messages

Errors in RVIZ about missing Map = "Global Status Error Frame [map] does not exist", "RobotModel error, no transform from [map]"
Some in Create3 WebUI LOGS about ntp timesync fail (before fixing timesync)

To Reproduce

TB4 on wireless network PC on same wireless network Start TB4 Run RVIZ/localization/navigation on PC

Other notes

(Here comes all the gruesome details)

I wanted to have my Turtlebot4 and my Ubuntu PC on the same wireless network and be able to run RVIZ, localization, and navigation on my PC for both making a map and for navigation. (As per Turtlebot4 User Manual, TUTORIALS section, "4. Generating a map" and "5. Navigation"

I first tried putting my PC on the "Turtlebot4" network - the default the T4 starts with as an access point. This didn't work - RVIZ showed errors about Map frame not available and I never saw the scan results in RVIZ.

I switched to putting my T4 on a wireless network. My site wireless is very picky about what it allows, so I switched to my home wireless network/internet. I was able to put the T4 on that wireless network, but things sitll didn't work - RVIZ showed errors about Map frame.

Poking around ROS2 on my PC and on the T4 RaspPi, I noticed there were topics on the RaspPi that were not on my PC. Important topics like /odom, and in fact all the missing topics appeared to be ones that originate with the Create3.

Searching through the Turtlebot4 Issues board found Issue #300 which seemed completely relevant to my situation. The problem in Issue #300 was that the RaspPi and the Create3 were not time synced. Looking at ROS topics, I realized all the packets on /odom were significantly behind the "current time" (by 12 days). Also transforms on /tf from odom->base_link (for example) were on the RaspPi with old timestamps. The /odom topic was not visible on my Ubuntu PC, nor were the odom->base_link transforms visible in /tf on my PC. I also saw the line in the Create3 LOGS about ntpd trying to timesync with irobot.pool.ntp.org servers.

On the Issues board, I found Issue #216 which showed how to configure the Create3 ntp.conf through the WebUI. Doing this did seem to get the two systems in time sync, based on looking at the Create3 time in the WebUI and the RaspPi time in a shell (after ssh to the RaspPi).

After a lot of fiddling around I also figured out that if I put the RaspPi onto a different wireless network, say its own wireless network or a network that didn't have internet access, the RaspPi time reverted to a time in the past (specifically, the day I installed the 1.0.0 image). Then the Create3 would time sync to the RaspPi and thus my laptop (once on the same wireless network) would not see any ROS2 information (topics or messages). From this I realized that the RaspPi was timesync-ing to an internet-based time server and thus would revert time when not on a wireless network with internet access.

ISSUE #1: I can't guarantee that the Turtlebot4 will always be on a wireless network that has internet access.

My solution for this would be to install a small real-time clock module on the RaspPi (like one from AdaFruit or MakerFocus. I'm wondering if this is something that would be recommended for the Turtlebot4 by Clearpath, and if there is a recommended add-on. I haven't yet exposed the board to see if the needed pins are available.

ISSUE #2: Even when I have the T4 and my PC on a wireless network with internet access, I still don't see all the topics on my PC (or all the transforms in /tf). I did confirm that the time does appear to be in sync between RaspPi, Create3, and my PC. I even confirmed that some messages on /tf are on /tf on both the RaspPI and my PC.

I'm not as sure how to continue to debug this one. I'm wondering if it could be a routing issue between my PC and the Create3 (through the RaspPi). As mentioned in the User Manual, SETUP->4.DiscoveryServer->UserPC However, the appearance of some Create3 ROS2 topics (like /cmd_vel) would imply that perhaps routing is okay? Any ideas here would be welcome.

Thanks!

Post-Fixes Summary

There were several issues involved in this:

yeff-fresh commented 6 months ago

Update: I took the TB4 home, so I could use my home wireless network with internet access. The RaspPi time-sync'd with whatever ntp time server it uses and the Create3 time-sync'd with the RaspPi. So it appeared that the RaspPi, the Create3, and my PC were all in the "same time". I still did not see various Create3 topics on my PC.

I was able to run SLAM on the RaspPi and enough topics appeared on my PC that I could actually run RVIZ and see what the T4 saw as it drove around my house. However, the map was very low resolution. (I was able to save the map, however)

I was also able to run localization and nav2 on the RaspPi and again run RVIZ on my PC. However, there were still several topics missing and I wasn't able to see much from the scan).

So I still have ISSUE #2 from above where I can't see all the Create3 topics on my PC even though the three systems appear to be time-sync'd.

I also have another question which is: How can I configure the RaspPi to time-sync off a local ntp server, versus whatever it is currently using? There doesn't appear to be a /etc/ntp.conf file on the RaspPi - is it elsewhere?

I'll look around some more later today, but any help would be appreciated. Thanks!

yeff-fresh commented 6 months ago

An update: The three systems (my Ubuntu PC, TB4 RaspPi, TB4 Create3) are indeed time-sync'd. I confirmed this by looking at system time on each system and the times were the same. But I believe I'm still seeing issues with topics and messages not traveling between my PC and the Create3.

I confirmed this by looking at various Create3 topics, such as /battery_state, /odom, and /wheel_status. /odom does not appear at all on the PC but is on the RaspPi. /battery_state and /wheel_status appear in both places (PC and RaspPi), but "topic echo" on the RaspPi shows messages, while on the PC it does not. Also, /cmd_vel appears in both places but publishing a msg to /cmd_vel on the PC does not move the TB4 while publishing to /cmd_vel on the RaspPi does.

I've seen issues on the board around simple discovery vs discovery server so I'm wondering if my problem might be related to this. I'm currently using Simple Discovery but tomorrow I will try using a Discovery Server setup and see if that makes a difference.

As always, any thoughts or input from Clearpath support are appreciated. Thanks!

hilary-luo commented 6 months ago

Hi @yeff-fresh it looks like you have a few things going on so I'll do my best to touch on the different topics.

  1. Time syncing - I wanted to briefly mention this and then we can come back to it when we have the other issues sorted. It is true that the raspberry pi is not able to retain it's time sync on reboot and relies on being able to synchronize the time remotely. One option is that you can add a hardware solution but I would agree with your later comment, that it is better to pursue syncing to a local NTP server. It seems that you already found the instructions on how to modify the ntp server settings on the create3. The raspberry pi actually uses the timedatectl package by default to manage the timesyncing. You can modify the ntp fallback server in /etc/systemd/timesyncd.conf or you can disable this to opt for the NTP daemon that you may be more familiar with.
  2. Create3 topics - this issue you are describing seems a bit more complex. Before we get deeper into troubleshooting simple discovery, it may be worth determining whether you ultimately want to be using simple discovery or discovery server. Simple discovery is currently necessary if you want to run multiple robots and have them communicating with each other. However, it does use multicasting which may be a problem on restricted networks. Discovery server is a bit more set up but does not rely on multicasting. The caveat is that we have not yet had time to release multi-robot with discovery server. Assuming we pursue troubleshooting the current set up, it would be helpful if you can try running ros2 daemon stop then ros2 daemon start on your pc and then ros2 topic list and then share the full list of topics that you are able to see on each of your robot and user PC.
yeff-fresh commented 6 months ago

Hi hilary-luo -

Time-Sync: Agreed with what you said, there are multiple solutions (onboard real-time-clock module, local ntp server) that we can implement. A question: what does timedatactl use as the primary ntp server? Is that setting in a configuration file somewhere if we wanted to set a different primary?

Create3 topics: Appended are the two lists (on my Ubuntu 22.04 PC, and on the RaspPi). The "missing" topics seem to be most of the Create3 topics given in documentation. There are topics visible on the PC that are on the Create3 list (example: battery_state) but I never see messages on the topic on the PC and I do see messages on the topic on the RaspPi. And, as noted, I did a publish to /cmd_vel on my PC and it didn't cause the TB4 to move, but publishing to /cmd_vel on the RaspPi did cause the TB4 to move.

I don't have a current preference on simple discovery v discovery server, but having a solution which would allow multiple bots on one network would be nice in the future. So I guess SimpleDiscovery could be a preference, not a requirement. Right now, I'd just like to see everything working and be able to run localization/nav/application_sw on my PC to talk to the TB4.

Thanks!

** List of Topics on My Ubuntu 22.04 PC

jeff-soesbe@AA000076:~$ ros2 topic list /battery_state /cmd_vel /diagnostics /diagnostics_agg /diagnostics_toplevel_state /dock_status /function_calls /hazard_detection /hmi/buttons /hmi/display /hmi/display/message /hmi/led /imu /interface_buttons /ip /joint_states /joy /joy/set_feedback /mouse /oakd/rgb/preview/image_raw /parameter_events /robot_description /rosout /scan /tf /tf_static /wheel_status

** List of Topics on the TB4 RaspPi

ubuntu@ubuntu:~$ ros2 topic list /battery_state /cliff_intensity /cmd_lightring /cmd_vel /diagnostics /diagnostics_agg /diagnostics_toplevel_state /dock_status /function_calls /hazard_detection /hmi/buttons /hmi/display /hmi/display/message /hmi/led /imu /interface_buttons /ip /ir_intensity /ir_opcode /joint_states /joy /joy/set_feedback /kidnap_status /mobility_monitor/transition_event /mouse /oakd/rgb/preview/image_raw /odom /parameter_events /robot_description /robot_state/transition_event /rosout /scan /slip_status /static_transform/transition_event /stop_status /tf /tf_static /wheel_status /wheel_ticks /wheel_vels

yeff-fresh commented 6 months ago

Newest update: reading through docs in detail I realized I had not put the Create3 on the wireless network, just the RaspPi. So I used the WebUI at port 8080 to set the Create3 on the wireless network and (surprise!) the topics appeared on my PC. Which, of course, makes perfect sense once I think about it (duh).

At that point, I can launch localization, nav2, and RVIZ on my PC and they all eventually get in sync and things appear to be working (and RVIZ is happy). I have a bad map of my house, but I believe I might be able to generate a better one now so that's a future task.

The NavigateToPose action server doesn't seem to be starting as my code (which talks to NavigateToPose to move the TB4 around) says it's waiting for the server but I'll work on figuring that out.

Of course, time sync can still be an issue but I know how I can deal with that.

I also have an issue where the RaspPi (and maybe the Create3 too) have trouble joining most wireless networks, but I'll try to diagnose that and see if I can figure any details.

I'm not sure if I can consider this issue Closed yet, but I'll keep this issue updated as I move forward.

hilary-luo commented 6 months ago

Sounds good, thanks for the update. I have some thoughts that may help you with your endeavors:

what does timedatactl use as the primary ntp server? Is that setting in a configuration file somewhere if we wanted to set a different primary?

Honestly I'm not sure on this one, I haven't modified its behavior before so your google search is as good as mine.

I also have an issue where the RaspPi (and maybe the Create3 too) have trouble joining most wireless networks, but I'll try to diagnose that and see if I can figure any details.

In regards to joining different wireless networks, there is much more flexibility with the raspberry pi on joining different networks. The create3 is limited to 2.4GHz networks and you have to use the webpage to configure it. If this becomes a problem you can switch to the discovery server which doesn't require the create3 to be directly connected to the network. For the raspberry pi, if the setup tool does not cover your use case, you can manually modify the netplan files to add the fields that you need.

I have a bad map of my house, but I believe I might be able to generate a better one now so that's a future task.

You had mentioned prior that the map wasn't as high of resolution as you wanted. If you want to increase the map resolution, there are a few things that you can do although it will rely heavily on the computational power of the device that you run the mapping on. You can pass in your own config file when launching slam (ros2 launch turtlebot4_navigation slam.launch.py params:=/full/path/to/slam.yaml). In particular I would want to bring your attention to resolution and minimum travel distance / heading. The first to increase the resolution (although if I remember correctly you have to decrease this number here) and the second to ease the load on the machine doing the computation and to ensure that the system can catch up when you stop moving the robot. Then make sure you watch your output for when the queue is getting too full. Be aware though that a non-zero minimum travel distance does mean that it will not start mapping until the robot moves.

The NavigateToPose action server doesn't seem to be starting as my code (which talks to NavigateToPose to move the TB4 around) says it's waiting for the server but I'll work on figuring that out.

In general the navigation can take a little bit to load up (just in terms of discovering all of the topics etc). If you are having a persistent issue with trying to get Nav2 to launch then feel free to expand on that.

yeff-fresh commented 6 months ago

First, thanks @hilary-luo for the details on increasing map resolution - I will be giving that a try at some point soon.

It turned out there was one more issue that was preventing everything from working correctly. We use Docker containers for our development environment. The Docker container I was using (which I didn't create) had switched the DDS middleware to Cyclone instead of FastRTPS and configured Cyclone to only run on the loopback interface. Once I removed these changes, I now had topics and messages appearing in all relevant places (on the RaspPi, on my Ubuntu PC, and in my Docker container).

With this final fix, I can run localization (with my house or office map) and nav2 on the RaspPi, run RVIZ on my Ubuntu PC, and run our software in the Docker container - everything communicates and I can interact with the Turtlebot4 from our software. Success!

I think with this final fix I can consider this issue Closed. I will update the original issue to detail all my problems and solutions, in case anyone runs across similar issues in the future.