ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
289 stars 224 forks source link

context.on_shutdown is never called #1287

Open firesurfer opened 4 months ago

firesurfer commented 4 months ago

Bug report

It looks like the callback that can be registered via context.on_shutdown is never called.

Required Info:

Steps to reproduce issue

Register a callback to on_shutdown and then try various methods of exiting the node

def shutdown_handler():
    print("exit")

def main(args=None):
     rclpy.init(args=args) 

     node = Node("TestNode")

     node.context.on_shutdown(shutdown_handler)
     rclpy.get_default_context().on_shutdown(shutdown_handler)

     rclpy.spin(node)

I tried sending various signals: SIGINT, SIGTERM, SIGQUIT

Expected behavior

exit should be printed twice

Actual behavior

Prints nothing

Additional information

I also tried calling try_shutdown:

try:  
     rclpy.spin(node)
except (KeyboardInterrupt, rclpy.executors.ExternalShutdownException):
    pass
finally:
    rclpy.try_shutdown()

But this didn't result into on_shutdown being called either.

If one tries to explicitly shutdown the context:

finally:
    node.context.shutdown()

The following error is thrown: failed to shutdown: rcl_shutdown already called on the given context

Related Issues

This issue is probably related to. But according to the issues I found on_shutdown should be at least called if the node is properly exited (or if try_shutdown is called?)

https://github.com/ros2/rclpy/issues/532 https://github.com/ros2/rclpy/issues/1077 <- This is actually what I am looking for as I do want to do some cleanup

fujitatomoya commented 4 months ago

This issue is probably related to. But according to the issues I found on_shutdown should be at least called if the node is properly exited (or if try_shutdown is called?)

if node is properly shutdown, on_shutdown callbacks are called. we have those test cases.

https://github.com/ros2/rclpy/blob/99f1ce933ddd1677fda8fa602af3684b93f814eb/rclpy/test/test_context.py#L18-L37

the problem is with signal. once signal comes in e.g SIGINT, it deferred signal handler shuts down with calling rcl_shutdown in rcl layer, that means all contexts in this process space will be reset (invalid).

i think there are a couple of problems here, (i did not get deeper, i may be missing something here.)

after all, this is duplication of https://github.com/ros2/rclpy/issues/532, i think we can close this an track https://github.com/ros2/rclpy/issues/532 instead.

firesurfer commented 4 months ago

Ok I read #532 that its only related to SIGINT (Ctrl+C) .

But this means there is currently no possibility to perform any cleanup action during shutdown when the program is closed via a signal (what is the default case during development) ?

fujitatomoya commented 4 months ago

As far as i played around, i could not find the solution to for the signal cases. But i might be missing here, i would like to hear from other maintainers and developers.

firesurfer commented 4 months ago

I found a very hacky workaround for performing cleanup calls at shutdown:

But it basically means one needs to recreate all publishers/clients etc. if one needs to perform any cleanup ros calls

    try:  
        while rclpy.ok():
            time.sleep(1.0)
    except KeyboardInterrupt:
        pass
    # This part is executed after the keyboard interrupt was caught. The default context is now destroyed and can not be recreated. 

    new_context = rclpy.Context()
    rclpy.init(context=new_context)
    new_node = rclpy.create_node("test", context=new_context)

    # Use the new new_node to perform cleanup ros calls

EDIT: Just to clarify: This does not solve on_shutdown not being called in signals.

audrow commented 3 months ago

I'm going to mark this as help wanted for now.

I think we need a much bigger discussion on signal handling across the ROS 2 core repos.

firesurfer commented 3 months ago

@audrow Sounds good. Is there already an open issue for that I could subscribe to ?

fujitatomoya commented 3 months ago

@firesurfer not that i am aware of...

there are several issues and PRs are opened,

besides those,

rclcpp, rclpy (this issue) and rosbag2 have the signal handler support by their own implementation.

I think we need a much bigger discussion on signal handling across the ROS 2 core repos.

So i this this means that we need to consider current implementation and problems, and discussion for redesign for ROS 2.

lks-dev commented 2 weeks ago

Hi there,

I am doing something else quite hacky, to be able to log messages to rousout after the SIGINT rclpy humble. The idea cam from recognizing, that the installed signalhandlers in rclpy's implementation are shutting down the current context before I was able to handle the sigint/keyboardInterrupt myself:

"""Implementation of a Minimal Example Node."""

import traceback

import rclpy
import rclpy.executors
import rclpy.logging
import rclpy.node
from example_interfaces.srv import AddTwoInts
from rclpy.signals import SignalHandlerOptions
from std_msgs.msg import String

LOGGER = rclpy.logging.get_logger(__file__)

class MinimalExampleNode(rclpy.node.Node):
    """
    Implement a Minimal Example Node, that showcases how to log after SIGINT.

    Based on example tutorial code from ros2 doc. MinimalPublisher and
    MinimalService.
    """

    def __init__(self) -> None:
        super().__init__('minimal_publisher')
        self.publisher_ = self.create_publisher(String, 'topic', 10)
        self.i = 0
        self.get_logger().info('init completed')
        # commenting the timer here showcases the fact, that the SIGINT will
        # only be processed, when a service request send to the node,
        # triggering the python callback.
        self.timer = self.create_timer(
            0.5,  # seconds
            self.timer_callback,
        )
        self.create_service(
            AddTwoInts,
            'add',
            self.add
        )

    def add(self, request, response):  # noqa: ANN001, ANN201, D102
        self.i = request.a + request.b
        response.sum = self.i
        return response

    def timer_callback(self) -> None:  # noqa: D102
        msg = String()
        msg.data = 'Hello World: %d' % self.i
        self.publisher_.publish(msg)
        self.get_logger().info('Publishing: "%s"' % msg.data)
        self.i += 1
        self.get_logger().info('self.i: %i' % self.i)
        if self.i > 2:
            # This will trigger the catch all exceptions block.
            msg = 'smthing bad could happen if i is above 2!'
            raise RuntimeError(msg)

def main() -> None:
    """Run Node in the ROS2 Network and log errors."""
    try:
        ros_context = rclpy.get_default_context()
        rclpy.init(
            context=ros_context,
            signal_handler_options=SignalHandlerOptions.NO
        )

        executor = rclpy.executors.SingleThreadedExecutor(
            context=ros_context
        )

        node = MinimalExampleNode()

        executor.add_node(node)
        executor.spin()

    except KeyboardInterrupt:
        node.get_logger().fatal(
            'got interrupted by a KeyboardInterrupt and will'
            ' shutdown now...'
        )
    except Exception as error:  # pragma: no cover
        # catch all exceptions to log error and traceback before exiting...
        err_msg = f'experienced an exception:\n{traceback.format_exc()}\n'
        'will shutdown now.'
        try:
            node.get_logger().fatal(err_msg)
        except Exception:  # incase the node is no longer there.
            rclpy.logging.get_logger(__file__).fatal(err_msg)
            # this is not published, useful fallback to get it in the
            # roslogging file.
        raise error
    finally:
        rclpy.try_shutdown()

if __name__ == '__main__':  # pragma: no cover
    main()

A caveat with this implementation is, that the signal is only processed in the client code, when a python callback is processed. Hence it only works with nodes, that regularly process python callbacks, e.g. from a timer. In one node, that does not rely on a timer i added an 'empty' callback in the following fashion: self.create_timer(.1, (lambda: None)) # used to get a raised KeyboardInterrupt recognized and handled

This pattern of not letting the rclpy's handle the signals is probably not the best idea, but I am not entirely sure how bad it is or what bad things could happen, as I make sure to manually call the rclpy.try_shutdown() in the final statement.

So any improvement suggests or information on why its a bad idea or how stupid the idea is, are highly appreciated.

From my perspective a on_shutdown callback, which is guaranteed to run, before the current context is invalided, for what ever reason, would be the 'best' solution for the task of gracefully execute cleanup code that involves communicating over the ROS2 network.