Add maximum times for a process to respawn

Santti4go commented 1 year ago

Hi, on this PR I'm adding the following feature:

Ability to set a maximum number of times for a process to respawn. respawn_max_retries

Why? Sometimes I want a process (a Node) to respawn a few times in case there were any trouble within the system. But if the Node keeps failing over and over again then I want it to stop spawning or it would overflow my logs. To accomplish that I added an optional parameter to set up the max number of retries. By default, i.e. without setting this parameter, the behavior remains the same as it was, infinite retries.

What changed? Added the parameter respawn_max_retries to both ExecuteProcess and ExecuteLocal. Type: int [OPTIONAL]. Default = None, which ends up on infinite respawns.

How I tested

I modified the talker_listener.launch.py launch file inside demo_nodes_cpp/launch/topics folder with the following:

def generate_launch_description():
    return LaunchDescription([
        launch_ros.actions.Node(
            package='demo_nodes_cpp', executable='talker', output='screen'),
        launch_ros.actions.Node(
            package='demo_nodes_cpp', executable='listener', output='screen', respawn=True, respawn_max_retries=1),
    ])

Run ros2 launch demo_nodes_cpp talker_listener.launch.py Two process spawned, talker and listener (to verify ps -a).
Now I killed the listener with kill -SEGV $(pidof listener) On the publisher/listener terminal you should see the listener process died.
Verify that the listener process is respawned. Note that I settled respawn_max_retries=1.
Kill the process again kill -SEGV $(pidof listener) Now since the process was already respawned 1 time, it won't respawn anymore.

EDIT: To run the tests added you can colcon test --packages-select launch --return-code-on-test-failure --event-handlers=console_direct+ Or if you just want to run the tests I added, called test_execute_process_with_respawn_max_retries you can run colcon test --packages-select launch --return-code-on-test-failure --event-handlers=console_direct+ --pytest-args -k max_retries and this would inly run the functions I've created.

Signed-off-by: Santiago Aupi

ivolis commented 1 year ago

👍

Santti4go commented 1 year ago

Friendly ping :) @wjwwood @methylDragon @adityapande-1995

Santti4go commented 1 year ago

Hey @wjwwood @methylDragon @adityapande-1995 is there anything I can do from my end to push this PR? Please let me know If I should assignee it to someone else.

clalancette commented 1 year ago

@adityapande-1995 Can you please take a look?

Santti4go commented 1 year ago

@adityapande-1995 Can you please take a look?

Hey @adityapande-1995 feel free to let me know if there is anything I can do from my end to push this PR!

Santti4go commented 1 year ago

Hi @clalancette @adityapande-1995 wondering if you would be interesting on having this feature in current release. Do you need me to do a rebase?

Santti4go commented 1 year ago

I've left some thoughts on how to simplify this implementation and make it easier to understand.

Thanks for the review @clalancette comments already addressed!

Santti4go commented 1 year ago

Looks good ! A minor point though, could you add to the docstring a clause that says that the respawn_max_retires parameter will be ignored if respawn is set to False ? Should be obvious, but just to be clear.

Nice catch @adityapande-1995 ! I just uploaded that and repeated the same test described in the PR description but using the XML launch file version. Thanks for your review!

Santti4go commented 1 year ago

Sorry I missed this on the last review, but we should also have a test for this functionality in https://github.com/ros2/launch/blob/rolling/launch/test/launch/test_execute_local.py and https://github.com/ros2/launch/blob/rolling/launch/test/launch/test_execute_process.py . Once that is in place, I'll be happy with this.

Sure thing, I'll work on that as soon as possible and let you know once it's done. @clalancette

Santti4go commented 1 year ago

Apologies for the delay @clalancette . I already created 2 test in test_execute_local.py and test_execute_process.py. Here are the test results of the last build

test_execute_process test_execute_local

Let me know your thoughts! @clalancette @adityapande-1995

clalancette commented 1 year ago

CI:

Linux
Linux-aarch64
Windows

Santti4go commented 1 year ago

This looks good to me. Thank you for iterating! I'll run full CI on it next.

It was a pleasure! Thank you @clalancette and @adityapande-1995 for you time. CI is looking good!

ros2 / launch

Add maximum times for a process to respawn #696