Improve polling for available file descriptors

ros / ros_comm

ROS communications-related packages, including core client libraries (roscpp, rospy, roslisp) and graph introspection tools (rostopic, rosnode, rosservice, rosparam).

http://wiki.ros.org/ros_comm

745 stars 911 forks source link

Improve polling for available file descriptors #2365

Open Hemofektik opened 6 months ago

Hemofektik commented 6 months ago

The polling of all file descriptors can take a lot of time especially on modern OSs where each process can open 1<<20 FDs. This PR addresses this issue and stops polling for FDs once the desired number of available FDs has been found. The heap memory needed for holding the FDs is also much smaller and no longer persistent throughout the lifetime of the XmlRpcServer.

TL;DR

Remove 8MB of memory for each socket created
Reduce available file descriptor polling to minimum necessary
- roughly a 1000 times faster
- old duration: 34'571'539 ns
- new duration: 34'464 ns

Hemofektik commented 4 months ago

For the PR to proceed, you should revert the whitespace-only changes and force-push a new commit without them.

I will do that thanks.

How did you measure the time? What is the influence on a real running system?

I saw this issue actually using nsight profiler when one of our nodes tried to create around 50 subscribers/publishers which takes forever. I then checked this code piece individually and measured its duration. 35 ms is not much for a single pub/sub but it can add up. So this change could save us 1-2 seconds in some nodes of startup time. This is probably also visible when using nodelets heavily.

peci1 commented 4 months ago

Interesting. Would it be possible to write a simple script (pair of scripts?) that would demonstrate this?

Hemofektik commented 4 months ago

Interesting. Would it be possible to write a simple script (pair of scripts?) that would demonstrate this?

You mean as part of the tests? I could try to add a test that is calling this code and measures it. In my particular case I copied the enoughFreeFDs() method to my test program to measure it individually before and after.

peci1 commented 4 months ago

Probably not as part of unit tests. The buildfarm performance varies too wildly to perform any kind of performance testing that would be able to measure such differences.

I meant something we could use manually when reviewing the PR.

Hemofektik commented 4 months ago

I meant something we could use manually when reviewing the PR.

Here is a hybrid of the original code and the final code in the PR I used for validating this. fd_test.zip You can replace the content of enoughFreeFDs() with the original parts from the ros_comm to see its original performance. You need to change the initFDs() function then also to resize the vector to the original 8 MB size to capture the 1<<20 FDs.

peci1 commented 4 months ago

I've tested the file you sent and indeed the proposed change makes it run much faster with higher ulimit -n values.

Relevant issues and PRs mentioned that most systems have this limit set to 1024, so the speedup would not be noticeable. But in docker, the limit is 1048576 by default and there it starts to be visible.

I'm still thinking about how to create a real-world (but self-contained and minimal) ROS testcase that would showcase the problem using roscpp. Would you be able to adjust the old example from https://github.com/peci1/ros_infinite_loop/tree/docker to demonstrate the problem?

Hemofektik commented 4 months ago

Would you be able to adjust the old example from https://github.com/peci1/ros_infinite_loop/tree/docker to demonstrate the problem?

Sure, but it would take some days until I have some spare time available.

peci1 commented 4 months ago

I'm nevertheless not a maintainer of this repo, so don't take my comments as something required. I just try to help getting this PR in a form that would be easy to grasp for maintainers so that it could eventually get merged. There's not much maintainership effort left for ROS 1, so the official maintainers need this kind of community help.

peci1 commented 4 months ago

I've tried running our teaching autonomous exploration pipeline with Trutlebot in Gazebo with this PR. Everything works as I'm used to. So at least practically, it seems be okay!

rosgraph