turtlebot / turtlebot4

Turtlebot4 common packages.
Apache License 2.0
101 stars 45 forks source link

Create3 100% CPU usage #266

Closed hilary-luo closed 2 weeks ago

hilary-luo commented 1 year ago

This is going to be a place to discuss and troubleshoot the CPU usage at 100%. I am guiding this process and the expectation is that anyone involved needs to commit to providing detailed information about their setup and implementing changes that may or may not fix the issue. The goal is to identify a common solution that can be deployed. Indiviudals are welcome to try additional tests and report back but must indicate what changes have been made in case it impacts the results of requested tests.

As a start, can each of you please provide:

Finally, can you describe what situations cause the 100% CPU issue and provide the create3 logs from such a situation. Feel free to share other symptoms that you see so we can track them. I want to ensure that I understand what each person is working with to make this process effective.

hilary-luo commented 1 year ago

@garyservin @NachtaktiverHalbaffe Can you please share the information described?

NachtaktiverHalbaffe commented 1 year ago

Your requested informations

Some more information about my setup

When does the CPU max out

hilary-luo commented 1 year ago

@NachtaktiverHalbaffe A few more questions before I suggest more changes:

  1. did you see any changes in performance when you updated to the latest Create3 firmware?
  2. did you see any changes in performance when you added the NTP changes?
  3. How repeatedly do you get the situation where the create3 does not recover? This is the primary situation that I want to be able to prevent / target. As we apply changes it would be nice to see if we can reduce the reliance on those delays that you mentioned.
  4. My understanding from what you have said is that you are primarily seeing the CPU max out when the nodes are launched and as a result it does not get through that initial discovery process. Sometimes it is able to make it through given the changes that you have made. You then mentioned 2 other symptoms that you see even after the cpu usage settles. During those events, do you also see the mean CPU usage held at 100%? or is it still sitting around 80-90%?

Finally, can you set up reserved IP addresses for the raspberry pi and create3 for this testing? Once you answer the additional questions I will outline the test I want you to do, but in preparation it would be useful to set up the reserved addresses.

NachtaktiverHalbaffe commented 1 year ago

EDIT: Updating with measurments form testing in the lab today

Im not in the lab on the robot today, so I can only answer the things I have observed. If needed, I can run additional tests tomorrow. I havent reliable tracked and recorded any number, so these are my impressions, but not anything specifically measured.

Based on the question I have some more information so you can understand the behaviour better:

To your questions:

  1. Cant say. We directly upgraded it during our setup process and before I found out about this problem and thus before debugging this behaviour
  2. Cant say either because a) before that all the default NTP servers werent reachable because blocked by our firewall. It appeared about the same time when I changed the NTP server on the Pi to our one when I also applied these ntp settings. Before that the NTPs werent reachable b) like mentioned before I applied this at same time with other changes But I can say that NTP related errors occur less often, but sometimes some warnings about delay problems.
  3. Before my changes I didn't find patterns other than these I mentioned. I had days where It occured every two nav2 startups and other ones where it didnt occur one time. The observation I have is that it is more likely to hang up and have the "red light of death" when it has occured before and it dissappears more likely if you take the time and powercycle all systems after that instead of using it right after the create 3 has rebooted after the "red light of death". At this time I was also integrating different modules regarding my plugin bridges, fixes to nav2 stack etc. so these could also had been impacted this.
    After my changes it doens't hang up with the "red light of death" that often. Had running it 3 days and it only occured two times at the end of my day. Had rebooted the nav2 stack multiple times at this day and also changed implementation of nodes (nav2 stack side) which also had then errors without crashing the turtlebot. So theres a correlation with my changes (but correlation doesnt mean causality). However, the unresponsivness problem during startup and the CPU maxing out still exists, it just doesnt crash that often
  4. You understood right. I also first launched and waited amcl a couple of times and when launching nav2 rviz hang up in visualization etc. so I think the create3 stopped sending messages for a few seconds there, so its enforcing this assumptions. I think I had it like one or two times also when everything was running, but I cant remember the circumstances and this was rare behaviour. Regarding the question: Havent tracked the numbers, but the turtlebot doenst peak at 100% that often when running (usually about 94-99%). These are my impressions:
    • The message filter dropping messages because of full cues occurs every time during startup and some time during operation and in combination with LIDAR messages. The Raspberry Pi seems to run fine and data is reliable received from the Lidar, so I think it isnt related to the create3 problem
    • The mismatched timestamp issues occurs after the short "hickups" and often the robot was unresponsive at these state. I assume this is more to faulty messages in message cues (faulty in terms of timestamps and other header infos) which are introduced by these hiccups. If these hickups are on my host or on a messaging hickup I cant say. However, after his the nav2 stack must be rebooted because it is stuck (because of blocked message cues I think), but then its a crash on the nav2 side I had some errors in my time synchronization/NTP configuration, now the mismatched timestamps dont occur anymore on regular basis
    • The extrapolation errors are most likely due to amcl etc. not running fast enough or not properly applying the transform tolerances. Its also appearing unregulary and not interupting navigation or leading to a crash

Do you meaning with reserving IP adresses that the Turtlebots gets the same IP adress every time? Then they are already reserved because our DHCP server is configured to assign the Turtlebots and the create3 bases the same IP adress every time.

hilary-luo commented 1 year ago

Thank you for the detailed response and yes I was referring to having the same IP address every time so that sounds perfect. To respond to some of your thoughts from above:

On the create3 logs there occurs mqtt and discovery related errors. I thought this is due to discovery server which is deactivated in my case. Can it be that this some kind of real internal error which should be tracked down?

My understanding is that the mqtt errors are internal to the create3 and can be ignored in this case.

Multiple Turtlebots and navigation2 stacks in same ROS Domain is not reliable working. Best approach seems to isolate each turtlebot and nav2 stack into own DOmain and then bridge the topics, services etc. which are needed between robots for the individual usecase

Once we get the single robot issue sorted out this is something that I would like to look at more closely. The systems should support two turtlebots running nav2 on the same network.

As the first requested change, can you make sure that the turtlebot4 diagnostics are disabled on the Turtlebot. If they were on previously then see if this makes a notable difference in the performance that you have described. (Steps to turn diagnostics off: SSH into the robot and run turtlebot4-setup -> go under ROS Setup -> Bash Setup -> Select TURTLEBOT4_DIAGNOSTICS -> Disabled -> Save -> Back out to the main menu -> Apply Settings -> Yes )

NachtaktiverHalbaffe commented 1 year ago
I have left it on one Turtlebot enabled and disabled on the other. Each Turtlebot was in its own ROS Domain and bridged to the central map server with domain_bridge from ROS2. I ran these measurements two times (complete reboot of systems in between) to measure the number on the same PC/Docker container for each Turtlebot. My rough results are: Situation With this change Without change Gain (Difference in avg CPU usage)
Idle (direct after booting) 70% at start, 53-63% after settled 75-77% at start, 70-73% after settled 7-20%
Idle with Nav2-Stack etc. running 95-100% at start, 73-77% after settled 100% at start, 88-91% after settled 10-20%
Robot driving (driving all the time => Full messaging load) 86-93% 97-98% 5-10%

This behaviour is consistent then I disabled diagnostics on the second Turtlebot too. However, depending on the PC I ran the nav2 stack and the turtlebot itself I have around 5% of deviation in CPU usage, so I would rather set the gains in 10-15% range for the Idles and 5% when running. However, still impressive gains. I track this more to ensure this is consistent.

Once we get the single robot issue sorted out this is something that I would like to look at more closely. The systems should support two turtlebots running nav2 on the same network.

Yeah in my case there was a log that it discovered ~70 Nodes when everything was in same ros domain, I have some more because of my Plugin bridges etc., so I assume I put additional stress to the create 3 on startup of the Nav2 stack compared to the stock one

Also saw that you can disable some safety overrides in the config of the create3. Maybe this could also reduce load if these are disabled and as a result not running?

garyservin commented 1 year ago

Some more information

CPU usage when the turlebot service isn't running and after letting the base to stabilize (about 30 minutes after powering on)

Sep 16 21:53:35 create3 user.notice create-platform: [INFO] [1694901215.651064233] [system_health]: CPU usage: max 57 [%] mean 48 [%] RAM usage: 32/59 [MB]
Sep 16 21:54:35 create3 user.notice create-platform: [INFO] [1694901275.653174351] [system_health]: CPU usage: max 68 [%] mean 54 [%] RAM usage: 32/59 [MB]

CPU usge when the turtlebot4 service is running on the Pi, after 30 minutes. CPU was at 100% for almost 10 minutes after starting the service

Sep 16 22:30:35 create3 user.notice create-platform: [INFO] [1694903435.668905870] [system_health]: CPU usage: max 79 [%] mean 72 [%] RAM usage: 31/59 [MB]
Sep 16 22:31:35 create3 user.notice create-platform: [INFO] [1694903495.669638204] [system_health]: CPU usage: max 82 [%] mean 73 [%] RAM usage: 31/59 [MB]
Sep 16 22:32:35 create3 user.notice create-platform: [INFO] [1694903555.670138188] [system_health]: CPU usage: max 61 [%] mean 55 [%] RAM usage: 32/59 [MB]

When does CPU max out

When I start mapping, I get disconnections, and the robot disappears for a second from rviz. It's worse when I set the mapping resolution to 0.01 via a param override. These are the logs I got during that time

Sep 16 23:04:48 create3 kern.warn kernel: [ 5855.130573] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:49 create3 kern.warn kernel: [ 5855.680544] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:50 create3 kern.warn kernel: [ 5856.780542] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:50 create3 kern.warn kernel: [ 5856.890538] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:50 create3 kern.warn kernel: [ 5857.330524] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:50 create3 kern.warn kernel: [ 5857.440514] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:51 create3 kern.warn kernel: [ 5857.885612] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:51 create3 kern.warn kernel: [ 5857.994330] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:51 create3 kern.warn kernel: [ 5858.430520] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:51 create3 kern.warn kernel: [ 5858.540508] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:52 create3 kern.warn kernel: [ 5858.980494] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:04:52 create3 kern.warn kernel: [ 5859.090496] MtAsicUpdateProtect: unknown protection mode(7)
Sep 16 23:04:52 create3 kern.warn kernel: [ 5859.533156] MtAsicUpdateProtect: unknown protection mode(9)
Sep 16 23:05:07 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:4,-4 meas_vel:[INFO] [1694905507.556423011] [motion_control]: Setting safety override to "none"
Sep 16 23:05:10 create3 user.notice ntpd: ntpd: reply from 162.159.200.1: delay 0.144823 is too high, ignoring
Sep 16 23:05:10 create3 user.notice create-platform: 0,0
Sep 16 23:05:10 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:35,-35 meas_vel:0,0
Sep 16 23:05:10 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:46807,45813
Sep 16 23:05:10 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:32,-32 meas_vel:0,-12
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 282  id: 67
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 893  id: 18
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 839  id: 81
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 5582  id: 25
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 193  id: 68
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 43  id: 36
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 292  id: 19
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 290  id: 18
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 232  id: 25
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 40  id: 25
Sep 16 23:05:10 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:05:10 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 179  id: 14
Sep 16 23:05:10 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 68  id: 30
Sep 16 23:05:40 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: [INFO] [1694905539.176044723] [motion_control]: Setting safety override to "none"
Sep 16 23:05:44 create3 user.notice ntpd: ntpd: reply from 201.217.3.86: delay 0.096592 is too high, ignoring
Sep 16 23:05:45 create3 user.notice ntpd: ntpd: timed out waiting for 201.217.3.85, reach 0xf6, next query in 530s
Sep 16 23:05:47 create3 user.notice create-platform: [INFO] [1694905547.778742418] [system_health]: CPU usage: max 100 [%] mean 80 [%] RAM usage: 40/59 [MB]
Sep 16 23:06:23 create3 user.notice create-platform: 516  id: 65
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:29,29 meas_vel:0,0
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 416  id: 108
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:06:23 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 2641  id: 18
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 293  id: 25
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 498  id: 65
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:48168,46941
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:35,-35 meas_vel:0,0
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:48378,46735
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:38,38 meas_vel:0,0
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:13,-13 meas_vel:0,0
Sep 16 23:06:23 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:4,-4 meas_vel:0,0
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 76  id: 30
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 135  id: 67
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 102  id: 67
Sep 16 23:06:23 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 165  id: 30
Sep 16 23:06:24 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 1179[INFO] [1694905584.295761460] [motion_control]: Setting safety override to "none"
Sep 16 23:06:35 create3 user.notice create-platform: [INFO] [1694905595.688223380] [system_health]: CPU usage: max 100 [%] mean 78 [%] RAM usage: 39/59 [MB]
Sep 16 23:06:56 create3 daemon.warn network[1468]: [WARN] (src/gateway_pinger.c:181) no ping recv, loss counter = 1
Sep 16 23:07:28 create3 user.notice ntpd: Script called in periodic mode
Sep 16 23:07:29 create3 user.notice create-platform:   id: 30
Sep 16 23:07:29 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:07:29 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 583  id: 65
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 36  id: 13
Sep 16 23:07:29 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:52463,50433
Sep 16 23:07:29 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:35,35 meas_vel:0,0
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 9  id: 25
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 148  id: 67
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 211  id: 30
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 315  id: 67
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 134  id: 30
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 57  id: 18
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 303  id: 30
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 60  id: 108
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 253  id: 67
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 267  id: 81
Sep 16 23:07:29 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 188  id: 25
Sep 16 23:07:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: [INFO] [1694905651.496038093] [motion_control]: Setting safety override to "none"
Sep 16 23:07:34 create3 user.notice create-platform: [INFO] [1694905654.142926717] [motion_control]: Setting safety override to "none"
Sep 16 23:07:35 create3 user.notice create-platform: [INFO] [1694905655.688407077] [system_health]: CPU usage: max 100 [%] mean 92 [%] RAM usage: 38/59 [MB]
Sep 16 23:07:56 create3 user.notice create-platform: 242  id: 25
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 167  id: 25
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 180  id: 25
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 199  id: 25
Sep 16 23:07:56 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:07:56 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 179  id: 19
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 145  id: 30
Sep 16 23:07:56 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_LOST_SYNC
Sep 16 23:07:56 create3 user.notice create-platform: proto event : PP_NOTIFY_LOST_SYNC
Sep 16 23:07:56 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_RELIABLE_PKT_DROPPED
Sep 16 23:07:56 create3 user.notice create-platform: proto event : PP_NOTIFY_RELIABLE_PKT_DROPPED
Sep 16 23:07:56 create3 user.notice create-platform: [DBG] pkt info: type 17, len 1
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 111  id: 25
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 43  id: 81
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 308  id: 18
Sep 16 23:07:56 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_SYNCED
Sep 16 23:07:56 create3 user.notice create-platform: proto event : PP_NOTIFY_SYNCED
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 759  id: 65
Sep 16 23:07:56 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 9  id: 81
Sep 16 23:08:04 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 147  id: [INFO] [1694905682.328727401] [motion_control]: Received new wall follow goal
Sep 16 23:08:09 create3 user.notice create-platform: [INFO] [1694905689.854429119] [motion_control]: Starting wall follow goal with side LEFT
Sep 16 23:08:10 create3 user.notice create-platform: [INFO] [1694905690.312377496] [motion_control]: WF Params sent
Sep 16 23:08:11 create3 user.notice create-platform: [INFO] [1694905691.078620499] [motion_control]: Wall follow hit max_runtime, succeeded
Sep 16 23:08:28 create3 user.notice create-platform: 30
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 28  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 155  id: 30
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 140  id: 36
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 557  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 215  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 128  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 128  id: 90
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 271  id: 36
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 60  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 141  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 95  id: 25
Sep 16 23:08:31 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 178  id: 18
Sep 16 23:08:32 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 110  id: 25
Sep 16 23:08:32 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 108  id: 25
Sep 16 23:08:32 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 24  id: 18
Sep 16 23:08:32 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 962  id: 108
Sep 16 23:08:32 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: [INFO] [1694905708.600713027] [motion_control]: Setting safety override to "none"
Sep 16 23:08:33 create3 user.notice create-platform: [INFO] [1694905713.000915788] [motion_control]: Setting safety override to "none"
Sep 16 23:08:36 create3 user.notice create-platform: [INFO] [1694905716.394503548] [system_health]: CPU usage: max 100 [%] mean 98 [%] RAM usage: 40/59 [MB]
Sep 16 23:09:07 create3 user.notice create-platform: 1202  id: 18
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 829  id: 81
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 1110  id: 25
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 100  id: 68
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 27  id: 90
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 112  id: 19
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 122  id: 107
Sep 16 23:09:09 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:09 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 38  id: 14
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 505  id: 30
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 939  id: 65
Sep 16 23:09:09 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:09 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 342  id: 30
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 329  id: 107
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 6915  id: 18
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 791  id: 81
Sep 16 23:09:09 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 434  id: [INFO] [1694905748.018305238] [motion_control]: Setting safety override to "none"
Sep 16 23:09:35 create3 user.notice create-platform: [INFO] [1694905775.692063680] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 40/59 [MB]
Sep 16 23:09:51 create3 user.notice create-platform: 18
Sep 16 23:09:52 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 438  id: 25
Sep 16 23:09:52 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 248  id: 67
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:52 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:09:52 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 420  id: 65
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:5,-5 meas_vel:0,0
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:4,-4 meas_vel:0,0
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:53372,50201
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:35,-35 meas_vel:0,0
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:53413,50161
Sep 16 23:09:52 create3 user.notice create-platform: [DBG] [SE]: !velocity_stopped - cmd_vel:35,-35 meas_vel:0,0
Sep 16 23:09:52 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 175  id: 67
Sep 16 23:10:04 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 95  id: 18
Sep 16 23:10:06 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 147  id: 107
Sep 16 23:10:06 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 118  id: 108
Sep 16 23:10:06 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 557  id: 30
Sep 16 23:10:06 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 21  id: 67
Sep 16 23:10:06 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 144  id: 67
Sep 16 23:10:06 create3 user.notice create-platform: [INFO] [1694905806.502085292] [motion_control]: Setting safety override to "none"
Sep 16 23:10:15 create3 daemon.warn network[1468]: [WARN] (src/gateway_pinger.c:181) no ping recv, loss counter = 1
Sep 16 23:10:15 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 291  id: 30
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 710  id: 30
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 276  id: 67
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 476  id: 30
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 154  id: 67
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 138  id: 107
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 73  id: 67
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 2113  id: 30
Sep 16 23:10:16 create3 user.notice create-platform: [DBG] proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:10:16 create3 user.notice create-platform: proto event : PP_NOTIFY_REMOTE_RESYNC
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 2283  id: 18
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 57  id: 81
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 476  id: 19
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 18  id: 18
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 9  id: 81
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 48  id: 18
Sep 16 23:10:16 create3 user.notice create-platform: [WAR] Packet handler running too long exec_time: 925  id: 25
hilary-luo commented 1 year ago

@NachtaktiverHalbaffe Thank you for the detailed breakdown. I agree that the additional nodes are likely adding to the loading on each cpu. Before we look at any safety overrides, this is what I would like you to do.

On the create3 webserver, go to beta features and Override RMW Profile. In the field there, paste the following profile with one modification:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <!-- UDPv4 Transport profile -->
        <transport_descriptors>
            <transport_descriptor>
                <transport_id>udp_transport</transport_id>
                <type>UDPv4</type>
                <!-- Reduce socket buffer size -->
                <sendBufferSize>4096</sendBufferSize>
                <receiveBufferSize>4096</receiveBufferSize>
                <!-- Reduce max message size, otherwise the participant creation fails -->
                <maxMessageSize>4096</maxMessageSize>
                <interfaceWhiteList>
                    <address>XXX.XXX.XXX.XXX</address>
                    <address>127.0.0.1</address>
                </interfaceWhiteList>
            </transport_descriptor>
        </transport_descriptors>

        <!-- Domain Participant Profile -->
        <participant profile_name="domainparticipant_profile_name" is_default_profile="true">
            <rtps>
                <!-- Use user defined UDPv4 transport -->
                <userTransports>
                    <transport_id>udp_transport</transport_id>
                </userTransports>
                <!-- Disable builtin transports -->
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>

        <!-- Default publisher profile -->
        <data_writer profile_name="default_publisher_profile" is_default_profile="true">
            <topic>
                <!-- Tune initial allocations -->
                <resourceLimitsQos>
                    <max_samples_per_instance>20</max_samples_per_instance>
                    <allocated_samples>20</allocated_samples>
                </resourceLimitsQos>
            </topic>
        </data_writer>

        <!-- Default subscriber profile -->
        <data_reader profile_name="default_subscriber_profile" is_default_profile="true">
            <topic>
                <!-- Tune initial allocations -->
                <resourceLimitsQos>
                    <max_samples_per_instance>20</max_samples_per_instance>
                    <allocated_samples>20</allocated_samples>
                </resourceLimitsQos>
            </topic>
        </data_reader>
    </profiles>
</dds>

The modification required is to change the <address>XXX.XXX.XXX.XXX</address> line to contain the IP address of the create3 over wifi. This can either be found using your router or the about page on the webserver.

Please stop here and do an analysis again to see if there has been an improvement.

Next you will need to make a matching change on the turtlebot. We are going to directly modify the file for this test although know that if you update your software it may get overwritten. Modify the contents of /etc/turtlebot4/fastdds_rpi.xml to the following, again with a modification:

<?xml version="1.0" encoding="UTF-8" ?>
<dds>
    <profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
        <!-- UDPv4 Transport profile -->
        <transport_descriptors>
            <transport_descriptor>
                <transport_id>udp_transport</transport_id>
                <type>UDPv4</type>
                <interfaceWhiteList>
                    <address>XXX.XXX.XXX.XXX</address>
                    <address>127.0.0.1</address>
                </interfaceWhiteList>
            </transport_descriptor>
        </transport_descriptors>

        <!-- Domain Participant Profile -->
        <participant profile_name="turtlebot4_default_profile" is_default_profile="true">
            <rtps>
                <!-- Use user defined UDPv4 transport -->
                <userTransports>
                    <transport_id>udp_transport</transport_id>
                </userTransports>
                <!-- Disable builtin transports -->
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Once again, the modification required is to change the <address>XXX.XXX.XXX.XXX</address> line to contain the IP address but this time with the IP address of the raspberry pi over wifi. This can either be found on the turtlebot4 standard screen display or on the /ip topic.

Once again see if this offers an improvement to the cpu usage.

@garyservin as a first test, can you disable the diagnostics on the Turtlebot and then see if this makes a notable difference in the performance that you have described. (Steps to turn diagnostics off: SSH into the robot and run turtlebot4-setup -> go under ROS Setup -> Bash Setup -> Select TURTLEBOT4_DIAGNOSTICS -> Disabled -> Save -> Back out to the main menu -> Apply Settings -> Yes ). And then please provide another log from the create3.

Do note that when we are comparing the CPU percentage over time, I am referring to the mean CPU percentage. And then if you can also comment on the other symptoms you see.

NachtaktiverHalbaffe commented 1 year ago

RMW Profile Create3

Situation With this change Without change Gain (Difference in avg CPU usage)
Idle (direct after booting) 95-100% at start , 54-63% after settled 70% at start (short time), 55-62% after settled 1-3%, but a longer peaking time until it settles to those low values
Idle with Nav2 etc. running 97-100% at start, 72-77% after settled 100% at start, 76-78% after settled 1-3%, but red light of death with crash reappeared
Robot driving (until differences show up) 77-86% 80-87% 10% (when it didnt crash before)

RMW Profile Raspberry Pi

Situation With this change Gain (Difference in avg CPU usage)
Idle (direct after booting) 48-54% after settled 1-3%
Idle with Nav2 etc. running 95-98% at startup, 66-73 after settled 3-7%, no max out at 100%
Robot driving (until differences show up) 72-82% 3-4%, feels more responsive

My thoughts

Looks like the Create3 RMW profile helps reducing the CPU load due to messages, but introduces much instability e.g. the Crashes reappeared. The load due to messages doesnt seem to be a big problem, so this doesnt seem worth it.

The Turtlebot Pi RMW profile in the other hand helped at the startup, more or less reliable avoiding 100% average usage at startup (a little below, so could eventually peak at 100% depending on my deviations in the hardware and to which Container the turtlebot is connected). It seems to tackle the problem with the load at startup which seems to be the main problem to tackle at this point (80% when nav2 stack runs seems fine at this point)

Also the common pattern at startup is that the amcl takes the most time until it has localized, after that the rest of navigation2 is quickly up. Seems AMCL is the bottleneck in my case w.r.t. to the time it takes to start nav2

hilary-luo commented 1 year ago

@NachtaktiverHalbaffe

Can you try this RMW profile on the create3 now? and see if it deals with the stability issue:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <!-- UDPv4 Transport profile -->
        <transport_descriptors>
            <transport_descriptor>
                <transport_id>udp_transport</transport_id>
                <type>UDPv4</type>
                <!-- Reduce socket buffer size -->
                <sendBufferSize>4096</sendBufferSize>
                <receiveBufferSize>4096</receiveBufferSize>
                <!-- Reduce max message size, otherwise the participant creation fails -->
                <maxMessageSize>4096</maxMessageSize>
                <interfaceWhiteList>
                    <address>XXX.XXX.XXX.XXX</address>
                    <address>127.0.0.1</address>
                </interfaceWhiteList>
            </transport_descriptor>
        </transport_descriptors>

        <!-- Domain Participant Profile -->
        <participant profile_name="domainparticipant_profile_name" is_default_profile="true">
            <rtps>
                <!-- Use user defined UDPv4 transport -->
                <userTransports>
                    <transport_id>udp_transport</transport_id>
                </userTransports>
                <!-- Disable builtin transports -->
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>

        <!-- Default publisher profile -->
        <data_writer profile_name="default_publisher_profile" is_default_profile="true">
            <topic>
                <!-- Tune initial allocations -->
                <resourceLimitsQos>
                    <max_samples>0</max_samples>
                    <allocated_samples>0</allocated_samples>
                </resourceLimitsQos>
            </topic>
        </data_writer>

        <!-- Default subscriber profile -->
        <data_reader profile_name="default_subscriber_profile" is_default_profile="true">
            <topic>
                <!-- Tune initial allocations -->
                <resourceLimitsQos>
                    <max_samples>0</max_samples>
                    <allocated_samples>0</allocated_samples>
                </resourceLimitsQos>
            </topic>
        </data_reader>
    </profiles>
</dds>

Again swapping out the IP address for your create3's IP address over wifi.

NachtaktiverHalbaffe commented 1 year ago

Im probably monday or tuesday (european time) in the lab and try then. Should I/we worry about following logs:

Sep 21 15:29:04 iRobot-030B6EE05A104B3AAD28DB9A4069B835 user.notice create-platform: [WAR] Packet handler running too long exec_time: 106  id: 67

? Or is this tackled by the new RMW Profile?

NachtaktiverHalbaffe commented 1 year ago
Situation With this change Gain (Difference in avg CPU usage)
Idle (direct after booting) 73% at start, 46-52% after settled A little bit higher idle and sometimes more volatile (didnt find a pattern which could cause the spikes at idle)
Idle with Nav2 etc. running 95-100% at start, 67-72% after settled No crash/red light of death anymore
Robot driving (until differences show up) 75-83%

Hadnt much time today, so the numbers itself arent that precise. But I restartet the nav2 stack multiple times and it didnt crash today, so the stability is much better in my case. I keep you updated as the runtime grows

Also observed that due to the deviations in hardware etc. it still can max out at 100%, but sometimes you can be lucky to be a little bit under which helps a lot. So I would generally say that the startup still maxes out, but "in a stable way" (and my setup is more heavy than the reference nav2 stack)

sdobbelaere commented 1 year ago

I'd like to add some feedback here as well. Overall the robot is responding very delayed - often 30s+

OS: (e.g. Ubuntu 20.04). 22.04 ROS Distro: (e.g. Galactic): Humble Built from source or installed: Installed Package versions: (if from repository, give version from sudo dpkg -s ros-$ROS_DISTRO-turtlebot4-PACKAGE_WITH_ISSUE, if from source, give commit hash) -> really just make sure you are using the latest of all packages on the robot and on your PC Create3 Firmware version: H.2.3 (Humble) Robot Version: (standard or lite): Standard RMW Implementation: (FastDDS or CycloneDDS): FastDDS Networking Configuration: (Simple Discovery or Discovery Server): Simple Discovery Network Information: Is the robot connected to 2.4GHz or 5 GHz (each of create3 & raspberry pi): create3: 2.4 / raspberry pi: 5 is it a dedicated router over which you have full control or an institutional network? Dedicated access point - router limited configurability Is there any other ROS communication on this network? (additional turtlebots or other devices and if so are they on the same ROS Domain or namespaced?) No Have you applied any custom changes through the create3 beta features? (NTP or DDS profile etc) No Do you have turtlebot4 diagnostics enabled? No Finally, can you describe what situations cause the 100% CPU issue and provide the create3 logs from such a situation. Feel free to share other symptoms that you see so we can track them. I want to ensure that I understand what each person is working with to make this process effective.

When taking any action, the cpu-load jumps to 100 and takes around 10s to respond even to simple commands like:ros2 run teleop_twist_keyboard teleop_twist_keyboard

When however I want to try to map out the space and run navigation inside of that map things become more challenging. The response-time jumps to 30s, and more often than not I don't actually see any of the spacial information inside of the map.

Screenshot 2023-09-27 at 20 04 47

Scenarios, with and without navigation running

Without navigation running: ros2 action send_goal /dock irobot_create_msgs/action/Dock "{}" -> 7s

With navigation running:ros2 action send_goal /dock irobot_create_msgs/action/Dock "{}" -> 30s

## Start bot (diagnostics on) idle
Sep 27 17:39:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836393.498958337] [system_health]: CPU usage: max 100 [%] mean 70 [%] RAM usage: 36/59 [MB]
Sep 27 17:40:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836453.485060285] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 37/59 [MB]
Sep 27 17:41:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836513.491705057] [system_health]: CPU usage: max 92 [%] mean 75 [%] RAM usage: 37/59 [MB]
Sep 27 17:42:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836573.493129801] [system_health]: CPU usage: max 91 [%] mean 74 [%] RAM usage: 37/59 [MB]
Sep 27 17:43:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836633.494686400] [system_health]: CPU usage: max 88 [%] mean 77 [%] RAM usage: 37/59 [MB]
Sep 27 17:44:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836693.494768125] [system_health]: CPU usage: max 88 [%] mean 79 [%] RAM usage: 36/59 [MB]
Sep 27 17:45:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836753.487647863] [system_health]: CPU usage: max 89 [%] mean 75 [%] RAM usage: 37/59 [MB]
Sep 27 17:46:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836813.487447025] [system_health]: CPU usage: max 93 [%] mean 76 [%] RAM usage: 37/59 [MB]
Sep 27 17:47:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836873.489677711] [system_health]: CPU usage: max 95 [%] mean 78 [%] RAM usage: 36/59 [MB]
Sep 27 17:48:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836933.488616985] [system_health]: CPU usage: max 93 [%] mean 76 [%] RAM usage: 37/59 [MB]
Sep 27 17:49:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695836993.493854332] [system_health]: CPU usage: max 92 [%] mean 75 [%] RAM usage: 36/59 [MB]
Sep 27 17:50:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837053.494742417] [system_health]: CPU usage: max 91 [%] mean 75 [%] RAM usage: 36/59 [MB]
Sep 27 17:51:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837113.495857533] [system_health]: CPU usage: max 88 [%] mean 74 [%] RAM usage: 36/59 [MB]
Sep 27 17:52:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837173.488342678] [system_health]: CPU usage: max 93 [%] mean 75 [%] RAM usage: 37/59 [MB]
Sep 27 17:53:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837233.488911308] [system_health]: CPU usage: max 93 [%] mean 79 [%] RAM usage: 37/59 [MB]
Sep 27 17:54:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837293.491616844] [system_health]: CPU usage: max 91 [%] mean 74 [%] RAM usage: 37/59 [MB]
Sep 27 17:55:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837353.495669743] [system_health]: CPU usage: max 91 [%] mean 79 [%] RAM usage: 36/59 [MB]
Sep 27 17:56:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837413.494057076] [system_health]: CPU usage: max 88 [%] mean 75 [%] RAM usage: 36/59 [MB]
Sep 27 17:57:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837473.497409492] [system_health]: CPU usage: max 87 [%] mean 76 [%] RAM usage: 37/59 [MB]
Sep 27 17:58:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837533.490047026] [system_health]: CPU usage: max 92 [%] mean 75 [%] RAM usage: 37/59 [MB]
Sep 27 17:59:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837593.488683472] [system_health]: CPU usage: max 88 [%] mean 75 [%] RAM usage: 37/59 [MB]
Sep 27 18:00:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837653.491836551] [system_health]: CPU usage: max 88 [%] mean 72 [%] RAM usage: 37/59 [MB]
Sep 27 18:01:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837713.493443877] [system_health]: CPU usage: max 89 [%] mean 72 [%] RAM usage: 36/59 [MB]
Sep 27 18:02:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837773.495935338] [system_health]: CPU usage: max 85 [%] mean 71 [%] RAM usage: 36/59 [MB]
Sep 27 18:03:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837833.496599454] [system_health]: CPU usage: max 88 [%] mean 69 [%] RAM usage: 37/59 [MB]
Sep 27 18:04:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837893.491623252] [system_health]: CPU usage: max 80 [%] mean 71 [%] RAM usage: 37/59 [MB]
Sep 27 18:05:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695837953.490764186] [system_health]: CPU usage: max 87 [%] mean 71 [%] RAM usage: 37/59 [MB]
Sep 27 18:06:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838013.490745910] [system_health]: CPU usage: max 83 [%] mean 73 [%] RAM usage: 37/59 [MB]
Sep 27 18:07:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838073.493089804] [system_health]: CPU usage: max 85 [%] mean 72 [%] RAM usage: 36/59 [MB]
Sep 27 18:08:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838133.493408386] [system_health]: CPU usage: max 86 [%] mean 70 [%] RAM usage: 37/59 [MB]
## Start navigation  (diagnostics on)
Sep 27 18:09:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838193.498143766] [system_health]: CPU usage: max 89 [%] mean 71 [%] RAM usage: 37/59 [MB]
Sep 27 18:10:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838253.497036930] [system_health]: CPU usage: max 100 [%] mean 81 [%] RAM usage: 37/59 [MB]
Sep 27 18:11:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838313.498803976] [system_health]: CPU usage: max 100 [%] mean 91 [%] RAM usage: 37/59 [MB]
Sep 27 18:12:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838373.492264000] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 37/59 [MB]
Sep 27 18:13:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838433.493904696] [system_health]: CPU usage: max 100 [%] mean 81 [%] RAM usage: 37/59 [MB]
Sep 27 18:14:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [DBG] [SE]: !wheels_stopped - odom:[INFO] [1695838493.493910103] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 37/59 [MB]
## Stopped navigation  (diagnostics on) - idle
Sep 27 18:15:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838553.497498017] [system_health]: CPU usage: max 100 [%] mean 80 [%] RAM usage: 37/59 [MB]
Sep 27 18:16:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838613.498438793] [system_health]: CPU usage: max 88 [%] mean 76 [%] RAM usage: 37/59 [MB]
Sep 27 18:17:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838673.491356415] [system_health]: CPU usage: max 91 [%] mean 80 [%] RAM usage: 37/59 [MB]
Sep 27 18:18:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838733.492967489] [system_health]: CPU usage: max 85 [%] mean 74 [%] RAM usage: 37/59 [MB]
Sep 27 18:19:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838793.494717065] [system_health]: CPU usage: max 91 [%] mean 74 [%] RAM usage: 37/59 [MB]
Sep 27 18:20:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838853.497016255] [system_health]: CPU usage: max 88 [%] mean 75 [%] RAM usage: 38/59 [MB]
Sep 27 18:21:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838913.499502351] [system_health]: CPU usage: max 90 [%] mean 77 [%] RAM usage: 37/59 [MB]
Sep 27 18:22:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695838973.500176027] [system_health]: CPU usage: max 91 [%] mean 78 [%] RAM usage: 37/59 [MB]
Sep 27 18:23:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839033.493630869] [system_health]: CPU usage: max 88 [%] mean 77 [%] RAM usage: 37/59 [MB]
Sep 27 18:24:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839093.494118739] [system_health]: CPU usage: max 94 [%] mean 77 [%] RAM usage: 37/59 [MB]
Sep 27 18:25:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839153.495454393] [system_health]: CPU usage: max 96 [%] mean 81 [%] RAM usage: 37/59 [MB]
Sep 27 18:26:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839213.497562907] [system_health]: CPU usage: max 94 [%] mean 81 [%] RAM usage: 37/59 [MB]
Sep 27 18:27:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839273.499783715] [system_health]: CPU usage: max 94 [%] mean 82 [%] RAM usage: 37/59 [MB]
Sep 27 18:28:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839333.501509163] [system_health]: CPU usage: max 99 [%] mean 83 [%] RAM usage: 37/59 [MB]
Sep 27 18:29:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839393.502375481] [system_health]: CPU usage: max 100 [%] mean 99 [%] RAM usage: 38/59 [MB]
Sep 27 18:30:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839453.502388746] [system_health]: CPU usage: max 100 [%] mean 100 [%] RAM usage: 39/59 [MB]
Sep 27 18:31:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839513.498378380] [system_health]: CPU usage: max 99 [%] mean 93 [%] RAM usage: 40/59 [MB]
Sep 27 18:32:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839573.498483749] [system_health]: CPU usage: max 97 [%] mean 93 [%] RAM usage: 39/59 [MB]
Sep 27 18:33:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839633.495961485] [system_health]: CPU usage: max 97 [%] mean 87 [%] RAM usage: 40/59 [MB]
Sep 27 18:34:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839693.503181407] [system_health]: CPU usage: max 93 [%] mean 82 [%] RAM usage: 40/59 [MB]
Sep 27 18:35:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839753.502094834] [system_health]: CPU usage: max 86 [%] mean 70 [%] RAM usage: 39/59 [MB]
Sep 27 18:36:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839813.496083804] [system_health]: CPU usage: max 90 [%] mean 73 [%] RAM usage: 40/59 [MB]
Sep 27 18:37:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839873.497631423] [system_health]: CPU usage: max 92 [%] mean 77 [%] RAM usage: 40/59 [MB]
Sep 27 18:38:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839933.498095069] [system_health]: CPU usage: max 90 [%] mean 74 [%] RAM usage: 40/59 [MB]
Sep 27 18:39:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695839993.499229896] [system_health]: CPU usage: max 89 [%] mean 76 [%] RAM usage: 40/59 [MB]
Sep 27 18:40:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840053.503582617] [system_health]: CPU usage: max 87 [%] mean 72 [%] RAM usage: 40/59 [MB]
Sep 27 18:41:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840113.505184653] [system_health]: CPU usage: max 87 [%] mean 75 [%] RAM usage: 40/59 [MB]
Sep 27 18:42:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840173.504987000] [system_health]: CPU usage: max 87 [%] mean 77 [%] RAM usage: 40/59 [MB]
Sep 27 18:43:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840233.498121087] [system_health]: CPU usage: max 87 [%] mean 78 [%] RAM usage: 40/59 [MB]
Sep 27 18:44:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840293.499808750] [system_health]: CPU usage: max 87 [%] mean 74 [%] RAM usage: 40/59 [MB]
Sep 27 18:45:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840353.503679736] [system_health]: CPU usage: max 87 [%] mean 77 [%] RAM usage: 40/59 [MB]
Sep 27 18:46:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840413.504473864] [system_health]: CPU usage: max 88 [%] mean 76 [%] RAM usage: 40/59 [MB]
Sep 27 18:47:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840473.506389877] [system_health]: CPU usage: max 83 [%] mean 70 [%] RAM usage: 40/59 [MB]
Sep 27 18:48:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840533.508229168] [system_health]: CPU usage: max 81 [%] mean 72 [%] RAM usage: 40/59 [MB]
Sep 27 18:49:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840593.501365117] [system_health]: CPU usage: max 87 [%] mean 71 [%] RAM usage: 40/59 [MB]
Sep 27 18:50:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840653.499304978] [system_health]: CPU usage: max 90 [%] mean 76 [%] RAM usage: 40/59 [MB]
Sep 27 18:51:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840713.503039257] [system_health]: CPU usage: max 90 [%] mean 78 [%] RAM usage: 40/59 [MB]
Sep 27 18:52:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840773.506583796] [system_health]: CPU usage: max 86 [%] mean 72 [%] RAM usage: 40/59 [MB]
Sep 27 18:53:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840833.506336676] [system_health]: CPU usage: max 85 [%] mean 70 [%] RAM usage: 40/59 [MB]
Sep 27 18:54:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840893.507693638] [system_health]: CPU usage: max 85 [%] mean 72 [%] RAM usage: 40/59 [MB]
Sep 27 18:55:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695840953.506344132] [system_health]: CPU usage: max 82 [%] mean 70 [%] RAM usage: 40/59 [MB]
Sep 27 18:56:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841013.500843001] [system_health]: CPU usage: max 83 [%] mean 71 [%] RAM usage: 40/59 [MB]
Sep 27 18:57:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841073.503812474] [system_health]: CPU usage: max 85 [%] mean 71 [%] RAM usage: 40/59 [MB]
Sep 27 18:58:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841133.507279133] [system_health]: CPU usage: max 85 [%] mean 72 [%] RAM usage: 40/59 [MB]
Sep 27 18:59:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841193.507510449] [system_health]: CPU usage: max 86 [%] mean 73 [%] RAM usage: 40/59 [MB]
Sep 27 19:00:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841253.509464648] [system_health]: CPU usage: max 90 [%] mean 74 [%] RAM usage: 40/59 [MB]
## Start navigation  (diagnostics on)
Sep 27 19:01:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] 
[1695841313.508490009] [system_health]: CPU usage: max 89 [%] mean 77 [%] RAM usage: 40/59 [MB]
Sep 27 19:02:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841373.503131154] [system_health]: CPU usage: max 100 [%] mean 82 [%] RAM usage: 40/59 [MB]
Sep 27 19:03:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841433.505303953] [system_health]: CPU usage: max 88 [%] mean 76 [%] RAM usage: 40/59 [MB]
Sep 27 19:04:54 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841494.253637454] [system_health]: CPU usage: max 100 [%] mean 85 [%] RAM usage: 40/59 [MB]
Sep 27 19:05:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841553.669177447] [system_health]: CPU usage: max 100 [%] mean 100 [%] RAM usage: 40/59 [MB]
Sep 27 19:06:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841613.510032967] [system_health]: CPU usage: max 100 [%] mean 97 [%] RAM usage: 40/59 [MB]
Sep 27 19:07:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841673.511075950] [system_health]: CPU usage: max 96 [%] mean 87 [%] RAM usage: 40/59 [MB]
## updating packages + disabling diagnostics
Sep 27 19:08:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841733.505783234] [system_health]: CPU usage: max 98 [%] mean 83 [%] RAM usage: 40/59 [MB]
Sep 27 19:09:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841793.505114787] [system_health]: CPU usage: max 93 [%] mean 82 [%] RAM usage: 40/59 [MB]
Sep 27 19:10:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841853.507337831] [system_health]: CPU usage: max 93 [%] mean 85 [%] RAM usage: 40/59 [MB]
Sep 27 19:11:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841913.506491156] [system_health]: CPU usage: max 91 [%] mean 80 [%] RAM usage: 40/59 [MB]
Sep 27 19:12:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695841973.511224193] [system_health]: CPU usage: max 92 [%] mean 86 [%] RAM usage: 40/59 [MB]
Sep 27 19:13:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842033.514256558] [system_health]: CPU usage: max 95 [%] mean 89 [%] RAM usage: 40/59 [MB]
Sep 27 19:14:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842093.505785293] [system_health]: CPU usage: max 94 [%] mean 87 [%] RAM usage: 40/59 [MB]
Sep 27 19:15:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842153.506673689] [system_health]: CPU usage: max 100 [%] mean 93 [%] RAM usage: 40/59 [MB]
Sep 27 19:16:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842213.507153392] [system_health]: CPU usage: max 100 [%] mean 98 [%] RAM usage: 40/59 [MB]
Sep 27 19:17:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842273.511638680] [system_health]: CPU usage: max 87 [%] mean 78 [%] RAM usage: 40/59 [MB]
Sep 27 19:18:53 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842333.511709341] [system_health]: CPU usage: max 84 [%] mean 63 [%] RAM usage: 40/59 [MB]
## Cold restart. (Diagnostics disabled)
Sep 27 19:24:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842678.115421492] [system_health]: CPU usage: max 100 [%] mean 67 [%] RAM usage: 37/59 [MB]
Sep 27 19:25:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842738.513841071] [system_health]: CPU usage: max 100 [%] mean 76 [%] RAM usage: 37/59 [MB]
Sep 27 19:26:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842798.415362758] [system_health]: CPU usage: max 79 [%] mean 66 [%] RAM usage: 37/59 [MB]
Sep 27 19:27:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842858.535428898] [system_health]: CPU usage: max 82 [%] mean 64 [%] RAM usage: 37/59 [MB]
## Undock  (diagnostics off) - 10s
Sep 27 19:28:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842918.374077878] [system_health]: CPU usage: max 100 [%] mean 73 [%] RAM usage: 37/59 [MB]
## Dock  (diagnostics off) - 12s
Sep 27 19:29:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695842978.117237071] [system_health]: CPU usage: max 100 [%] mean 76 [%] RAM usage: 37/59 [MB]
## Staring navigation (diagnostics off)
Sep 27 19:30:40 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [WAR] Packet handler running too long exec_time: 315  id: [INFO] [1695843039.403828166] [system_health]: CPU usage: max 100 [%] mean 76 [%] RAM usage: 38/59 [MB]
Sep 27 19:31:40 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843098.117473144] [system_health]: CPU usage: max 100 [%] mean 99 [%] RAM usage: 39/59 [MB]
Sep 27 19:32:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843158.470702510] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 40/59 [MB]
Sep 27 19:33:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843218.471232249] [system_health]: CPU usage: max 87 [%] mean 76 [%] RAM usage: 40/59 [MB]
## Start teleop keyboard
Sep 27 19:34:39 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843278.679167178] [system_health]: CPU usage: max 100 [%] mean 83 [%] RAM usage: 39/59 [MB]
Sep 27 19:35:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [WAR] Packet handler running too long exec_time: [INFO] [1695843338.462472228] [system_health]: CPU usage: max 100 [%] mean 81 [%] RAM usage: 40/59 [MB]
## Dock
Sep 27 19:36:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843398.877527917] [system_health]: CPU usage: max 100 [%] mean 87 [%] RAM usage: 40/59 [MB]
## Stop navigation (diagnostics off)
Sep 27 19:37:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843458.546366578] [system_health]: CPU usage: max 100 [%] mean 91 [%] RAM usage: 40/59 [MB]
Sep 27 19:38:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843518.118077836] [system_health]: CPU usage: max 97 [%] mean 81 [%] RAM usage: 39/59 [MB]
Sep 27 19:39:38 iRobot-6F2890FA31C0486E80A20C80E321CE4D user.notice create-platform: [INFO] [1695843578.118348200] [system_health]: CPU usage: max 83 [%] mean 73 [%] RAM usage: 39/59 [MB]

Overall the load seems very high, and response-times slow - to the point where I can't really tell if I'm doing something wrong or if this is expected behaviour.

Full post cold restart logs attached as well: messages-5.txt

NachtaktiverHalbaffe commented 12 months ago

@hilary-luo Any updates?

hilary-luo commented 11 months ago

@sdobbelaere @NachtaktiverHalbaffe I have not forgotten about this, and I will get back to you although I cannot give you a specific timeline.

marredir commented 11 months ago

Hi, I have been interacting on this issue through this ticket. The plan was to contact @hilary-luo via email to start similar tests to the ones mentioned above. But I now have a slightly different setup that I think it is very reliable and hopefully gives us some hints on the problem we are facing. I am running with the discovery server (so create3 and RPI communicates only over USB) on RPI. I have nav2 running on my laptop (so basically RPI has "just" the discovery server) : from what I understand, we probably agree that the discovery process is the faulty one (either increasing by far the CPU usage or by "loosing" some info along the way, TBD). What really makes the trick for me is to be sure that the ntp server is in sync before starting to operate (and by this term I mean SLAM or localization plus quite a lot more of nodes, e.g. vision or others with quite high-speed traffic exploiting tf and IMU and yes, all running on a high-spec laptop). Look here and follow verbatim the instructions: after issuing (from RPI) the command "sudo chronyc clients", if I get the message "blabla is too high, blabla" then I have problems, either I can't see some topics (e.g. scan or tf) or even if I can see them with "ros2 topic echo" from a shell then it is nav2 that fails to activate some components (be it map server or localization or others..). It might be useful if others could replicate this setup ( @NachtaktiverHalbaffe @sdobbelaere can you give it a try?). @hilary-luo: if these discovery problems impact so heavily the create3 CPU can we mitigate them (at least temporarily) via fw having a finer control over the fw? By that the I mean less topics and tune the frequency of them ( I guess most of people use SLAM and localization without caring too much about other topics that are published at high rates). In the old days (I am talking ROS1), the amount of topics to run SLAM and localization was very minimal and even lower rates worked fine.. it is not the best but I think it would alleviate a lot of headaches till we find a more permanent solution. As usual I am available to test and experiment things if needed. Thanks

agorkor commented 11 months ago

Will create3 have a fix for 100% CPU utilization in near future ?

marredir commented 11 months ago

@agorkor looking at the thread there is still quite a lot of uncertainty, and the tickets have been opened for quite a while... but back to the technical problem, also today I had a very reliable session (I added to nav2 ultra sound sensors and it worked reliably for more than 1h): can you guys please check if you see messages like this one you have failures:

"user.notice ntpd: Script called in periodic mode"

Try to issue the following command from RPI and see afterwards the create3 log from the webserver: sudo chronyc clients

Thanks

hilary-luo commented 11 months ago

@agorkor I can't promise a timeline on a universal fix although depending on specifics of your use case there are things that you can do (discovery server for single robot for example, or disabling diagnostics to decrease system load).

Things keep coming up that have taken me away from being able to focus on this issue although it is high priority for me.

@marredir thanks for joining the conversation on this ticket and giving an idea to others on what to test. We are limited on what we can do with the create3 firmware given that it is not an open source product. You could mess around with changing the DDS profiles to use that to throttle the topics but I don't think that will be a universal solution that will work for everyone given different people's experience levels.

Simba185 commented 11 months ago

@hilary-luo Thanks for the update.

@marredir Yes my TB4 acts the same, if there is any time sync issue it is erratic, and then a sync issue can occur part way through a test and data will stop working.

For the Turtlebot4 system to be reliable this needs to be resolved.

There needs to be a method or system to control the frequency and importance of messages. What happens when the WIFI signal is reduced and the bandwidth is reduced or more sensors are added.

marredir commented 11 months ago

@Simba185 I understand, but do you have the nav2 on a laptop or on the raspberry? I don't have any glitch once I make sure that the system are in sync (before starting all the ROS2 nodes)

@hilary-luo yes I understand, should we involve the icreate team? I think everyone would benefit from that. Probably you have some direct contact to add to this thread.

Thanks

Simba185 commented 11 months ago

@marredir nav2 is running on the Laptop. (2022 dell with i5 1245u) but system monitor shows only 1 core stuck at 100% (seems like rviz cant use the cores available.)

Then over 3 or 4 hours, with the system running the time sync will get lost and the Create will say "Delay Too high, Ignoring" and then rviz drops all the frames.

@roni-kreinin has stated to switch to from Simple Discovery to server discovery, I will test this and see if it behaves better.

marredir commented 11 months ago

@Simba185 sorry may be I was not clear: the problem seems to be the Create3 spinning at 100%. My configuration uses the discovery server and it seems to work reliably, nevertheless we need have a way to "reset" the ntp server. It seems that a "restart application" doesn't work. If you look here, at point 10 it says that if the ntp has a too large gap "If this happens, simply restart the robot (not just the application) via the webserver over the network connection." but to shut everything down is not possible from the webserver when in discovery mode (that button seems to be available only when the base is in AP mode) and it would be anyway very unconvenient. @hilary-luo can we contact iRobot and make something about it? May be if we "reliably" are able to restart the ntp the system might recover (still not a full solution but we would get the system more reliable). Anyone at iRobot that might get involved?

NachtaktiverHalbaffe commented 10 months ago

Sorry for replying lately. Our usecase doesnt allow using discovery server. We had NTP issues, sometimes we get the "delay is too high" logs, but these are just increasing the usage for a short time and not causing a complete hang up or 100% utilization, this only occurs due to the node discovery problem. We have also have a special NTP config because the firewall of the lab is blocking all ntp servers except the one from university. Maybe this NTP problems werent that erratic in my case because I never launched the nav2 stack directly after the robot has booted so it had plenty time to sync time etc.

The both fixes from @hilary-luo which I tested (disabling diagnostics and custom dds configs) and the DDS fix from https://github.com/turtlebot/turtlebot4/issues/216#issue-1797043215 improved the situation so that the Turtlebot4 wont hang up completely anymore (only one Turtlebot per ROS Domain, each Turtlebot is isolated and the important topics are bridged). Since these fixes the Nav2 stack has started reliably again in our usecase

I finished my master thesis and thus I dont have unrestricted access to the Turtlebot4 anymore. I might do voluntary work on this topic, but I added my supervisor @MatWise to this issue as well who will be using my model process with the Turtlebot4 in the future

marredir commented 8 months ago

Sorry, it took a long time. Just to summarize: my setup has the raspberry just running the discovery server, all the nav2 nodes are running on a laptop and I do not use any image shipped by clearpath, I build everything on top of a standard ubuntu22 image (both for the raspberry and my laptop). I see a lot of the problems mentioned previously when the ntp server on the create3 is not properly "synced" at the beginning. In fact the problems either start at start-up or do not occur. It is hard to say when the ntp is synced from the log: when one sees messages that mention that the delay is too high that means that the ntp server should be restarted. May be iRobot in this can look whether it makes sense to have log messages that confirm that everything is in sync? I also suggest to look at Zenoh: I have tried both localization and slam and it seems very reliable (I did not tried extensively but the start up phase seems very effective). Hope this helps. Thanks

hilary-luo commented 7 months ago

@NachtaktiverHalbaffe congrats on finishing your thesis!

It is definitely an interesting thing to note that some people have had success using separate ROS Domain IDs and bridging certain topics. I want to bring some emphasis to that point for people who may be coming across this. This should also be able to be done using the same ROS domain ID and modifications to the DDS profiles but bridging the domain would definitely be easier to set up.

@marredir You are able to restart the NTP daemon under beta features (if you don't have this option, update your firmware). We are actively engaged with iRobot even if they are not directly present in this thread and there are things coming down the line although I can't make any promises about dates or features. You mention that you build everything on top of a standard ubuntu image. Are you building everything from source? or have you tried running from installs? In regards to Zenoh, it is definitely on our radar for the future. If you end up testing this out feel free to provide feedback back and/or PRs.

Overall I am hoping to have some new tests for people to try out shortly but please do keep sharing what you find works / doesn't work for you.

Simba185 commented 7 months ago

@hilary-luo

Thanks for the update,

Really looking forward to seeing what can be done to mitigate these issues and how to move forward. ( we really need a solution as TB4 is not reliable in this current state).

Thanks

hilary-luo commented 7 months ago

@Simba185 Did you get discovery server set up on your system? I don't think I heard feedback on how that went for you. (In the meantime, I do want to ensure that you have tried the work around tactics that have helped other people.)

garyservin commented 7 months ago

Just to add an extra data point, I was able to successfully run SLAM and navigation on my TB4 after configuring the discovery server on the TB. 🎉

Simba185 commented 7 months ago

@hilary-luo

Unfortunately no further time has been allotted to TB4 once we saw that there was lots of unresolved issues with time sync and these CPU load issues.

I'm glad to hear Discovery server works for many as a solution for smooth SLAM (with simple discovery the robot will stop part way and continue).

Until I am confident that there are 100% solutions for all these issues I'm not sure when I will have more time to work on any tests. (as the consumer , I should not have to test solutions for the Manufacture.)

I look forward to any updates, Thanks.

hilary-luo commented 7 months ago

Hey all, iRobot has released a new create3 firmware (H.2.5) which should feature some improvements for this issue. Please try it out and let me know how it impacts your system, the more quantitative the feedback the better. I expect you should see improvements for both those using discovery server as well as simple discovery.

ooijenvangilbert commented 7 months ago

I see that I am a bit late to the party, I've found out that my setup also runs into the 100% and red ring situation. But my setup differs, I am on ubuntu 20.04 and running galactic with the create 3 running g5.4 version.

I have 2 rPi setups, one with the image from the Turtlebot 4 and one own build, my own build starts lidar en oak-d and I am able to create maps and navigate without problems. This setup does not use the usb communication between rPi and create and basically works fine.

It's the Turtlebot setup that uses the usb-c communication that runs the create 3 into trouble. I am curious as to what is the added value of running communication over the usb-c line (both create 3 and rPi have wifi and work fine on their own)?

Simba185 commented 7 months ago

@ooijenvangilbert Can you provide some further details about how you built your own image for TB4? How is the commands sent to the base without usb communication?

Has anyone created a ROS1 setup?

ooijenvangilbert commented 7 months ago

@ooijenvangilbert Can you provide some further details about how you built your own image for TB4? How is the commands sent to the base without usb communication?

Has anyone created a ROS1 setup?

basically the create 3 also has a wifi connection and therefor can handle all commands for movement.

I also used this repository https://github.com/iRobotEducation/create3_examples

and added the ros implementation for the Luxonis cam to get it all working.

JJJ2777 commented 6 months ago

Hello, I have the same problem with the CPU at 100% and a red light. I am trying to figure out what is causing the problem. Do you use one of these two commands:

ros2 run teleop_twist_keyboard teleop_twist_keyboard

ros2 launch turtlebot4_bringup joy_teleop.launch.py

Do your CPUs increase after these commands or using Navigation2 in RViz? And if that's the case for you, which one courses the highest increase?

hilary-luo commented 5 months ago

@ooijenvangilbert

I am curious as to what is the added value of running communication over the usb-c line (both create 3 and rPi have wifi and work fine on their own)?

If you have a setup where the wifi connection on the create3 works for you then there is no need to have the usb-c connection. Earlier in this ticket, one of the configurations that I was having people try was indeed blocking the usb-c communication. This will be easier to control in Jazzy, unfortunately it is harder to configure in Humble. However, there are many situations where it is desirable to disconnect the create3 from the wifi and only use the USB-C connection (what is done in the discovery server setup). Some networks do not handle 2.4GHz to 5GHz connections gracefully which can cause issues. Additionally, scaling up with many robots works best with the create3 only communicating over usb, not over wifi.

hilary-luo commented 5 months ago

Is anyone still having the CPU at 100% issue using Humble firmware 2.6? I am going to be looking for some people to give feedback on the next round of updates and I want to understand the use cases that are still having issues and how the new update will work (or not) for your use case.

mauriz commented 5 months ago

Hi, I can not run these experiments as I have switched to Zenoh (which seems to be extremely stable, I can shutdown and resume the system many times without a glitch). In this regard, do you have a roadmap of what you are going to support? Do you plan to stick with Humble for a while or do you plan to move to successive versions (e.g. Iron or similar)? Thanks

hilary-luo commented 5 months ago

@mauriz I cannot speak to any long term road mapping. I will say though that as long as Zenoh is not officially supported by ROS, it is very unlikely to be supported here.

That being said, I would still encourage you to share your experience with Zenoh, how you are using it and how it has helped (maybe as a discussion topic).

ooijenvangilbert commented 3 months ago

@ooijenvangilbert

I am curious as to what is the added value of running communication over the usb-c line (both create 3 and rPi have wifi and work fine on their own)?

If you have a setup where the wifi connection on the create3 works for you then there is no need to have the usb-c connection. Earlier in this ticket, one of the configurations that I was having people try was indeed blocking the usb-c communication. This will be easier to control in Jazzy, unfortunately it is harder to configure in Humble. However, there are many situations where it is desirable to disconnect the create3 from the wifi and only use the USB-C connection (what is done in the discovery server setup). Some networks do not handle 2.4GHz to 5GHz connections gracefully which can cause issues. Additionally, scaling up with many robots works best with the create3 only communicating over usb, not over wifi.

That would be my next option to test to have the create3 only use USB-C so the RPI can send cmd_vel commands without the use of wifi. Did shutting down the wifi on de create3 releave any problems as described?

ooijenvangilbert commented 3 months ago

@ooijenvangilbert

I am curious as to what is the added value of running communication over the usb-c line (both create 3 and rPi have wifi and work fine on their own)?

If you have a setup where the wifi connection on the create3 works for you then there is no need to have the usb-c connection. Earlier in this ticket, one of the configurations that I was having people try was indeed blocking the usb-c communication. This will be easier to control in Jazzy, unfortunately it is harder to configure in Humble. However, there are many situations where it is desirable to disconnect the create3 from the wifi and only use the USB-C connection (what is done in the discovery server setup). Some networks do not handle 2.4GHz to 5GHz connections gracefully which can cause issues. Additionally, scaling up with many robots works best with the create3 only communicating over usb, not over wifi.

That would be my next option to test to have the create3 only use USB-C so the RPI can send cmd_vel commands without the use of wifi. Did shutting down the wifi on de create3 releave any problems as described?

so you mention that to have communication over USB-C we have to do the discovery server setup. But I have just flashed the standard setup and when I run a Ros2 topic list it gives errors when the usb-c is not connected to the create3. This would imply that even without discovery server there is communication ...

hilary-luo commented 3 months ago

@ooijenvangilbert you don't need discovery server to have communication over the usb-c connection, in discovery server you ONLY have connection over the usb-c connection so it is required. That answer was addressing the question of why communication is allowed over usb-c.

In simple discovery it should be able to communicate over wifi or usb-c and if you are not able to connect without the usb-c connection then you need to take a look at your wifi connections / bridging between networks.

royito55 commented 3 weeks ago

Hi,

We have made a video explaining why this issue happens, and how it can be solved by simply pulling and running a docker container in your TurtleBot 4.

The basic idea is separating the ROS_DOMAIN_ID of the RPi4 and the Create 3, and connecting them through two zenoh bridges in order to isolate and protect the Create 3.

I hope it helps!

https://www.youtube.com/watch?v=xmK2I0D5sas

Check out our docker image here

1. Requirements

Raspberry Pi configuration (use turtlebot4-setup tool):

Create 3 configuration (Access web server in a browser: 192.168.1.XX:8080):


2. docker-compose.yaml

✅️ The best way to run this image is with the following docker-compose.yaml:

services:
  zenoh-bridge-turtlebot4:
    image: theconstructai/zenoh-bridge-ros2dds:turtlebot4
    container_name: zenoh-bridge-turtlebot4
    network_mode: "host" 
    restart: always # Ensures the container restarts on reboot

3. Run container

  1. docker compose -f /path/to/docker-compose.yaml up -d.
  2. Turn robot off completely and turn it back on (press and hold power button and wait for chime, then place on charger).
  3. Verify all TurtleBot 4 topics with ros2 topic list.
hilary-luo commented 2 weeks ago

Hello everyone, it has been a while.

@royito55 Thanks for making that video and sharing it here! Community contributions are always welcome!

For anyone not wanting to use Zenoh bridge, please update to the latest packages and use Discovery Server with individual robot namespaces. The discovery server implementation now supports multi-robot and uses the create3 republisher which should address any lingering issues with the Create3 cpu getting overloaded. Please do read the documentation accompanying the update though as it will make changes to how the Create3 is namespaced. A fix is coming out shortly to address being able to run Discovery Server without a robot namespace in the new update.

I am closing this ticket for now. If you happen to still run into any CPU usage issues using the new software update please do let me know and I will re-open this ticket and investigate.