osrf / autodock

ROS packages for automatic docking
Apache License 2.0
78 stars 27 forks source link

High CPU usage even when IDLE #4

Open swaroophs opened 2 years ago

swaroophs commented 2 years ago

Hi!

Thanks for this project! We have successfully deployed this algorithm long term and it has been working well :). Over time, we have observed that the simple_autodock node seems to be taking a lot of CPU resources even when IDLE. Sample htop screenshot shown below when TB3 sim is at the dock.

image

Since this is just a state machine, could we figure out if there is something in the implementation that is causing it to have high CPU usage? I have done some preliminary investigation to look at if there are unrated loops or high frequency timer callbacks but could not find any egregious issues. We are running this in a moderately resource limited computer (4 core/8GB RAM) and sometimes, we see other issues because of the dock node overloading the system. If you have ideas on how to debug, I will be more than happy to try them out and report back/make a PR for review. FWIW this issue has been observed on both 18.04/Melodic and 20.04/Noetic.

Best, Swarooph

youliangtan commented 2 years ago

Resolved in the mentioned PR, by toggling aruco_detect via debug_mode:=False

swaroophs commented 2 years ago

Hi @youliangtan, I don't think we should close this issue just yet. Like I mentioned in the other PR,

This PR is adjacently related to https://github.com/osrf/autodock/issues/4 that I reported. However it does not directly fix that.

This is because, the simple_autodock still seems to be using a lot of resources (47% CPU consumption in the picture above) for a state machine. This will be true for this node even after the fix in the other PR.

youliangtan commented 2 years ago

yes, I think we should keep this open. I briefly spent some time investigating this, but had no luck in identifying the main cause of high cpu load during idle. I kinda suspect it is related to action server creation and or tf_listener msg sub, no concrete answer to this. certainly, writing this in c++ will significantly boost the performance.

LKSeng commented 2 years ago

I'm on a similar journey to reduce CPU usage for autodock functionality, where possible. Note that I'm using a version of autodock that predates that of debug_mode.

Some things I've tried to modify on autodock_server.py and her effects on CPU usage when run simulation:

  1. Removing self.__tfBuffer and self.__tf_listener does not seem to impact CPU usage.
  2. Removing self.__as reduces ~10% CPU usage
  3. Removing self.__timer reduces ~10% CPU usage
  4. Removing rospy.Subscriber("/pause_dock", ...) and self.__marker_pub does not seem to impact CPU usage.

Combining the above, the idle CPU usage for the simple_autodock drops from ~45% to ~25% on my CPU, though this modified simple_autodock would not be a useful one.

My current working theory is that it is due to using sim_time reference: ROS answers. The accept answer outlines that the node actually isn't doing nothing, but having to follow the /clock time, which could potentially be updated at 1000Hz. From the post, it is unclear if using roscpp would improve performance.

Undoing above changes and setting enable_sim=false through the simulation launch file, I observe that the CPU usage of the simple_autodock drops to 2.0%, but it renders the tf portion non-functional. Setting the simulation world with the following (see for more information):

    <physics type="ode">
      <max_step_size>0.01</max_step_size>
      <real_time_factor>1</real_time_factor>
      <real_time_update_rate>100</real_time_update_rate>
    </physics>

Renders the simple_autodock to run at about 7% CPU usage when idling.

Unfortunately, I do not have a live system to validate this theory at the moment.

youliangtan commented 2 years ago

@LKSeng Thanks for the in-depth explanation, i also did the exact long experiment weeks ago by commenting on each component one by one, and derived the same conclusion. Yes, running in sim will create some overhead due to the subscirbtion of /clock. So hopefully we can test it out soon with actual time when time permits.

Just one thing to note, when running it in sim mode, it is crucial to have the same "real_time_factor" for all test scenarios to effectively compare their perf. This can be achieved by changing "max_step_size" and "realtime_update_rate" in gazebo gui.

LKSeng commented 2 years ago

@youliangtan I am today years old to realise that the max_step_size and real_time_update_rate can be updated on the fly in gazebo (under world>physics in the sidebar, "real time update rate" and "max step size" values can be modified).

Indeed, my comparison were done with a final "real_time_factor" of 1 in all cases. I've changed the settings to reflect a "real_time_factor = 1" some time back on my local copy and I forgot about it.

Repeating the comparison on the tb3_nav_dock_sim.launch, which has a "real_time_factor" of 1, at the standard max_time_step = 0.001 and real_time_update_rate = 1000, a ~17% CPU utilisation was reported on rqt_htop. The modified max_time_step = 0.01 and real_time_update_rate = 100 drops this to about ~5%, though admittedly at this modified time step the turtlebot3 burger jitters and tends on rotate/ translate by itself overtime.

LKSeng commented 1 year ago

I saw a discourse about high CPU usage on Python nodes related to gazebo/sim_time (albeit in ROS2) and was immediately reminded about this.

On my end, on a live robot, on idle the node takes around 0.31% CPU, while during parallel correction it was observed to peak up to 7% CPU usage, monitored via rqt_top.

youliangtan commented 1 year ago

@LKSeng There could be some overhead in the communication layer in ros2 dds. This application which uses ros1 might be different in this case.

In regards to the CPU usage, this might be due to the activation of the third-party pkg aruco_detect during docking., which is totally understandable if this is the case.