xqms / rosmon

ROS node launcher & monitoring daemon
Other
180 stars 47 forks source link

FD leak when processes respawn #158

Closed stevegolton closed 2 years ago

stevegolton commented 3 years ago

I beleive I have observed an FD leak when processes respawn using rosmon.

Steps to reproduce:

  1. Create a new ROS node which simply exits immediately after it's been run.
  2. Create a new launch script which launches that node, and set a respawn policy on it.
    <launch>
    <node pkg="broken" type="node" name="broken" respawn="true" respawn_delay="0" />
    </launch>
  3. Run the launch file using rosmon. You should see the node being restarted in an infinite loop as intended.
    mon launch <pkg> test.launch
  4. Check the number of file descriptors opened by the rosmon process, this will increase by 1 every time the node is restarted.
    ls -la /proc/$(pidof rosmon)/fd | wc -l
xqms commented 3 years ago

Interesting, thanks for the report! I guess the problem is that we close the PTY file descriptors only when we attempt to read() from them and get a return value of 0, which indicates that the other side (the child process) has closed it. However, with short respawn_delay, it could happen that we start the new process before this has happened.

stevegolton commented 2 years ago

Hey @xqms, apologies for the delay. I got some time to do a bit more experimenting with this issue today, but found I was only able to repro the issue on older versions of rosmon. It looks like the issue has been fixed in the latest version 2.4.0.

2.3.2-1focal.20210423.233006 amd64 - FD Leak 2.3.2-1buster.20210424.200434 arm64 - FD Leak 2.4.0-1focal.20210820.224250 amd64 - No issues

I created an example package to help with testing, but it looks like it won't been needed anyway.

Apologies, I think we can call this one fixed.