iRobotEducation / create3_docs

Documentation for the iRobot® Create®3 Educational Robot
BSD 3-Clause "New" or "Revised" License
49 stars 16 forks source link

Massive iRobot Create3 ROS2 issues! #387

Open scottcandy34 opened 1 year ago

scottcandy34 commented 1 year ago

I have 16 iRobot Create3's all running humble H.1.0 with RPi4's (running ubuntu 22.04 with ROS humble) communicating over USB-C. I setup all of these for a Lab and personally configured the USB connections. And multiple strange things are happening.

Originally they where running Galactic G.3.1 Not a single issue with anything. They all ran fine with goals and other commands. except for Dock goal, but that was ok and expected.

But I had them change to Humble H.1.0 because of the dock goal change/update because the PI's are running 22.04 which will only work with the humble version of ROS and the docking goal was changed in the versions. After the update there has been nothing but issues some of which could be fixed by "sudo apt upgrade" on the pi's but more issues would appear and then disappear. I would have them downgrade the firmware but issues would still arise.

The bot I have has had not one issue like they are having and my bot would run all day without any problems. Its a lab so we tell them what to put into their node.py within a ros2_ws/src and run the nodes.

  1. Lets say you boot the bot. and it didn't have any problems running a goal or other commands directly from the CMD (SSH into PI) but you run the same commands from within a node. it works for the first few then no communication at all. every call will get hung up. if you reboot it will work again then will stop working.

  2. Lets say you boot the bot and it didn't work with any commands at all and action servers are offline and you can't even see "ros2 param list" and no matter how many time you reboot the bot it changes nothing.

  3. Lets say the next day after you were having the above 2 issues your bot would work perfectly fine and no issues would happen.

  4. Lets say your bot was working fine one day and the next you were having issues 1 or 2.

Note all the bots boot with a solid red ring even mine which works fine. I have also personally looked over each groups code that was having problems and its all the same as mine which has no issue. I have cleaned out their src folder because a few groups ran "colcon build" in there which apparently can mess with ROS2 commands. This fixed some different issues they where having.

I have done extensive testing and trouble shooting to diagnose what on earth is happening and nothing is consistent either. This is maddening lol. I have been working on theses issues for 2 weeks solid. Not all robots are being problematic

  1. The only things I have been able to determine is that because of how the issues are and a review of the logs it looks like a possible NTPD connection could be the problem? but I have tried to fix that by https://iroboteducation.github.io/create3_docs/setup/compute-ntp/ this guide. Now I have only done this once but it did not fix the robots issues.

  2. Also it feels like after a while if sent goals from within a python node it will make the robot freak out and get hung up and stop excepting commands. So almost like just the python node send goals are not functioning correctly.

  3. Or it feels like something is messed up when the build the package.

I can't even explain how much I have tried and done to get the bots to work correctly. Now some groups don't even have any issues, just like my bot. So if someone could please tell me I'm not going crazy lol. I could use the hand.

alsora commented 1 year ago

Hi @scottcandy34, your problem seems to be related to network issues. Are all the robots connected to the same Wi-Fi?

However, I want to point out an important point. It looks like you updated the robots from Galactic to Humble only because of the Dock action. There's a better solution! You could simply build the galactic branch of the irobot_create_msgs repo even if you are using Humble on your RaspberryPi!

scottcandy34 commented 1 year ago

They are all setup to be and at times they are all on the same connection. But when these issues were prominent only a few were on. All bots are on a different domain id as well. the PI's are on the same network. I can connect to the robots web page configs just fine for each bot, That's how I'm able to view the logs.

Network issues would make some sense but at the same time not because the bots have no issue being connected to the Wi-Fi. Only network type thing I can think of is NTP servers.

That is a good solution wish I realized that earlier, but that doesn't fix the overall problem that something is going on with these robots lol. The labs will end today so having them downgrade and do that is too much at this point. Plus once they go back to galactic some still would have issues, although at this point I don't think they would.

I would still like to figure out what on earth is going on so either you guys can put out a patch or I can finally know what not to do lol. My goal is at this point knowledge and preventative. Ill have the bots back here soon so I plan on working on them as is at least until I can figure out what is happening.

Jayden-F commented 1 year ago

You are not alone @scottcandy34 I have 12 Turtlebot4s and intend to control them from a single computer. Unfortunately, I am in the same position. For now I am experimenting with 2 robots running Humble v1.2 and the create3 has always been the single point of failure; I have experienced numerous issues in Namespacing, NTP time sync issues and RMW preventing progress. To make matters worse the issues are really inconsistent leading to further confusion and frustration.

I can't even explain how much I have tried and done to get the bots to work correctly. Now some groups don't even have any issues, just like my bot. So if someone could please tell me I'm not going crazy lol. I could use the hand.

You are not crazy at all.

scottcandy34 commented 1 year ago

@Jayden-F Thank you for letting me know lol. It really is unfortunate. I plan to work on them more next week to get them all working since the students are done with them. If you get anywhere would you post and let me know? I'll do the same.

brianabouchard commented 1 year ago

Hi all - I'm a professor who worked through the process of getting multiple creates with Pis online on the same network in the past year. I'd be happy to chat with either or both of you about how to do that. Feel free to email me at briana.bouchard@tufts.edu.

ipa-rar commented 1 year ago

@Jayden-F @scottcandy34. I own a fleet of 4 turtlebot4s and for the past 4 months I have been struggling to get these robots controlled by a single computer. I have been trying to figure out all the possible things that can go wrong while doing this setup and finally figured out that the culprit is create3 base. Looking at the logs it seems the CPU as well as the RAM is maxed out and everything fails.

I have been doing some research about create3 base and found some interesting things in a recent publication about create3 base. paper

The iRobot Create® 3 is a robotic platform designed to give developers access to a robust mobile base through standard ROS 2 APIs. It is equipped with a variety of sensors and actuators: encoders, optical flow and IMU for pose estimation, bumpers, cliffs and IR for obstacle detection, and wheels, LED lights and audio speakers controllable by the user.

The robot produces 70 KBps of raw sensor data and processed information, while also running a 50 Hz motion control loop and an obstacle detection pipeline. Users can either publish individual actuation commands or send action goals to execute autonomous behaviors such as pose regulation, wall-following, and docking. The single-board computer that is present on the robot imposed severe resource constraints to the design of the application, due to a processor with limited CPU and less than 60 MB of RAM.

The robot runs a single-process, manually composed, ROS 2 application which takes advantage of IPC8 and an optimized executor [17]. This ROS 2 system is made of approximately 10 nodes, with more than 30 topics, 10 services and 10 action servers, in addition to the automatically created entities. ROS 2 is used both for the internal implementation of algorithms and drivers, as well as for interfacing with the user.

The iRobot Create® 3 application is normally controlled by an external navigation software, such as Nav2 described in Sec. VI-A, and this results in the use of approximately 60% CPU and 32 MB RAM. More performance intensive operations such as remotely subscribing to all the published topics (e.g. for visualization purposes or to record a log), can increase the CPU usage to 80%. The robot also supports multiple rmw, and with some of them the RAM usage grows up to 40 MB.

So according to this publication the robots CPU and RAM is not supposed to Max out. But: "Tests were run on Ubuntu 20.04. The Linux proc pseudo-filesystem was used to accurately record operating-system level metrics."

The robot has become highly unpredictable ad this needs to be fixed ASAP else the robot is of no use.

scottcandy34 commented 1 year ago

@ipa-rar @Jayden-F I agree its become highly unpredictable.

I just went through all 16 of my bots and tested each one. I also upgraded to H.1.2 I copied over my code and replaced the students code also cleaned up all the build/install/log folders and ran it. everything worked fine without issues one PI needed to be updated 'sudo apt upgrade' which fixed its communication issue. The majority of the bots still boot with a RED ring before finishing boot. before this I had tested my pi on their bots and didn't have the issues they were having. now as I have said before I had inspected their code/PI before but couldn't find anything. but one thing I did change was setup.py file to mine which they could of done something that might of messed with the bots coms. I also didn't test when they were having the issues whether or not it was their code I could of removed the build/install/log and removed it from '.bashrc' source. but didn't think about that as a test until now. That would of proven if it was their code or the PI, based on all the other testing I had done.

Now this still doesn't explain all of the issues I have been having with their bots. I didn't test with more than 2 on the network but each bot is on its own domain so that shouldn't affect it. The bot will still occasionally have problems like it cant keep up with what you told it todo and acts slow or slow response.

In all my testing I have checked the bots log page in the web UI and very rarely have I seen it maxed out completely. now it does utilize 80 to 90 CPU percentage almost always. I didn't pay attention to the ram too much unfortunately. The bots for ROS are highly unpredictable.

royito55 commented 1 year ago

What RMW_IMPLEMENTATION are you using? I know CycloneDDS will not work because in order for it to work, we would need to go into the create3 base and change the XML configuration file that the driver nodes are being launched under.

It seems like they are adding beta features to tackle this: https://github.com/turtlebot/turtlebot4/issues/145

ROS 2 is simply too unreliable to work out of the box for communication between PCs in the same network. ROS 1 worked much better in this regard, and I think it's DDS's fault. If it worked as advertised, we wouldn't have to restart the daemon every three seconds.

agrueneberg commented 1 year ago

We also ran into a lot of issues when using two robots (TurtleBot 4) on the same network. I tried to summarize the problem and potential workarounds in the following document: https://docs.google.com/document/d/18Uiy3k-HlsiR4ICyweVondgNwyK8dJw2Eda17PrzJdY/edit?usp=sharing Comments are appreciated.

brianabouchard commented 1 year ago

Hi all - there are some new docs on a discovery server setup that I've been testing and seem to work quite well: https://iroboteducation.github.io/create3_docs/setup/discovery-server/

Note - I do not have the full Turtlebot4 setup, but I am using the Create 3 robot, a Raspberry Pi, RPLIDAR, and a computer running a linux VM. I'm hoping to test multiple robots in the next few days, but with one robot, this setup has been seamless.

Jayden-F commented 1 year ago

@agrueneberg Thank you for your document, I am also experimenting with 2 TurtleBot4s (Hoping to move to 12) Yet I am roadblocked by the exact same issues you have described below.

Our experience with multiple robots was poor (to say the least). At this time, it is not clear whether multiple robots can be used at the same time on the same network controlled by a single external computer. The main reason for this is that the Create 3 is hitting 100% CPU usage frequently and early, causing it to become unresponsive. This problem appears to occur even when using a single robot, but having more than one namespaced robot exasperates it.

I have also tried:

To add upon what you have. I also tried FastDDS with the discovery server to eliminate the multicast traffic. However, lifecycle nodes such as the nav2 stack hang and constantly wait for each other to be ready. Therefore, I am unable to use it at this time.

https://github.com/eProsima/Fast-DDS/issues/3505

Yet to try:

I am currently experimenting with individual ROS2 networks for each robot using cyclone dds; Each Turtlebot4 will have its create3 disconnected from the network and will only communicate to the Rpi through a ROS2 network. The Rpi is then configured to use the usb ip network adapter as its preferred ROS2 network adapter isolating all ROS2 traffic between the Create3 and Rpi. We will use http Post and Get between the Rpi and centralised controller to send commands.

This is an insane work around, as we re-implementing what ROS2 and the RMW should be able to achieve, However, with the current state of the TurtleBot4 and RMW. It seem to be the only option. If anyone has better suggestion please let me know. I really need multiple TurtleBot4s working with a censtralised controller.

agrueneberg commented 1 year ago

@Jayden-F Thanks for your feedback! I have added it to the document. We also had your idea, but had to drop it because other lab groups might want to use the robots, too. How would you configure Cyclone DDS to only communicate via usb0?

royito55 commented 1 year ago

I'm currently finding some success by completely isolating the Create3 from anything except the RPi4. I do this in CycloneDDS to avoid the problem @Jayden-F described.

@agrueneberg you can configure this in the new feature to override the RMW XML config, something like <NetworkInterfaceAddress>usb0</NetworkInterfaceAddress>

This is my config for the create3:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
    <General>                                          
            <NetworkInterfaceAddress>auto</NetworkInterfaceAddress>
        <AllowMulticast>false</AllowMulticast>
            <MaxMessageSize>65500B</MaxMessageSize>
            <FragmentSize>4000B</FragmentSize>
        </General>
        <Discovery>
            <Peers>
                <Peer address="[RPI4 IP]"/>
            </Peers>
            <MaxAutoParticipantIndex>100</MaxAutoParticipantIndex>
            <ParticipantIndex>auto</ParticipantIndex>
        </Discovery>
        <Internal>
            <Watermarks>
                <WhcHigh>500kB</WhcHigh>
            </Watermarks>
        </Internal>
        <Tracing>
            <Verbosity>severe</Verbosity>
            <OutputFile>stdout</OutputFile>
        </Tracing>
    </Domain>
</CycloneDDS>

This is over wlan0 and not usb0, I still want to switch to see if things run better like that.

Multicast is disabled to reduce traffic and also increased packet size to decrease number of them processed. I only add the RPi4 as a Peer Address to ensure the create3 doesn't see anything else. Then I can worry about obtaining the topics I need from an external computer by "routing" them through the RPi4, which will discover all the other machines in my setup.

Jayden-F commented 1 year ago

@agrueneberg I factory reset the create3; by default the create3 uses both network interfaces with cyclone dds. I then configure the Rpi to use the usb0. @royito55 method seems much better, and I am curious about the routing messages through the RPI4. Do you have more information on how this works?

royito55 commented 1 year ago

Well it's like a workaround I though about to avoid the CPU maxing out, and it wouldn't be necessary for every setup. It's not related to DDS but the topics the create3 offers.

For example, if you make all your pi's see each other and other external PCs, then you can send nav2 goals to the pi's from the external PC no problem.

But say you need to send dock commands to all your robots from an external PC, then you'd need to discover robot_n/dock coming from the create3, but with the setup I described, the PC doesn't see the create3. So then you need to write some ROS 2 nodes and run them in the pi's to address that, which I haven't lol.

For topics it will be easier by just running the great topic_tools relay package in the pi's.

shamlian commented 1 year ago

We have just released G.5.2 and H.2.2, which reduce core robot loading by about 9%. Hopefully this, in combination with other configuration work that is in progress. will help. Please update your Create 3 robots to H.2.2, and let us know if you find a similar improvement.

lukeopteran commented 11 months ago

@shamlian @alsora

Following these discussion on performance, I have tried H.2.3 with a custom Eclipse Cyclone DDS configuration on the robot as suggested above. A short while after a reboot with the DDS config script set for the robot, it crashed (permanent red LED), I could not power-off / restart it. Taking the battery out for less than 15 mins it kept immediately going back to a white spinning LED with no response when trying to hotspot it. Over 15 mins it doesn't turn-on when installing the battery and on the dock it goes back to a white spinning LED, with no change when going .

Is there a way to hardware reset it? I have tried holding the various buttons down for long periods (>20s) but still not getting a response (cannot access Wifi or hotspot the robot).

Thanks

shamlian commented 11 months ago

@lukeopteran The one thing I can think to try is to connect to the robot over its usb0 interface once it is fully booted (wait about 6 or 7 minutes after it is powered on) and attempt a factory reset with something like curl -X POST http://192.168.186.2/api/factory-reset (this just triggers the same endpoint as if you had clicked the "factory reset" link from the webserver). If that doesn't work, please email education@irobot.com so we can set up a call; I think that will be more efficient than trading messages here.

lboorman commented 11 months ago

thanks will give it a try

lukeopteran commented 11 months ago

@shamlian That worked, thanks very much, have switched to the Galactic FW!

royito55 commented 6 days ago

Hi,

We have made a video explaining why this issue happens, and how it can be solved by simply pulling and running a docker container in your RPi4.

Although it's made for the TurtleBot 4, it applies to any create 3 attached to a computing unit like a RPi4.

The basic idea is separating the ROS_DOMAIN_ID of the RPi4 and the Create 3, and connecting them through two zenoh bridges in order to isolate and protect the Create 3.

I hope it helps!

https://www.youtube.com/watch?v=xmK2I0D5sas

Check out our docker image here

1. Requirements

Raspberry Pi configuration (use turtlebot4-setup tool):

Create 3 configuration (Access web server in a browser: 192.168.1.XX:8080):


2. docker-compose.yaml

✅️ The best way to run this image is with the following docker-compose.yaml:

services:
  zenoh-bridge-turtlebot4:
    image: theconstructai/zenoh-bridge-ros2dds:turtlebot4
    container_name: zenoh-bridge-turtlebot4
    network_mode: "host" 
    restart: always # Ensures the container restarts on reboot

3. Run container

  1. docker compose -f /path/to/docker-compose.yaml up -d.
  2. Turn robot off completely and turn it back on (press and hold power button and wait for chime, then place on charger).
  3. Verify all TurtleBot 4 topics with ros2 topic list.