Open firesurfer opened 4 months ago
This issue is probably related to. But according to the issues I found on_shutdown should be at least called if the node is properly exited (or if try_shutdown is called?)
if node is properly shutdown, on_shutdown
callbacks are called. we have those test cases.
the problem is with signal. once signal comes in e.g SIGINT, it deferred signal handler shuts down with calling rcl_shutdown
in rcl layer, that means all contexts in this process space will be reset (invalid).
i think there are a couple of problems here, (i did not get deeper, i may be missing something here.)
after all, this is duplication of https://github.com/ros2/rclpy/issues/532, i think we can close this an track https://github.com/ros2/rclpy/issues/532 instead.
Ok I read #532 that its only related to SIGINT (Ctrl+C) .
But this means there is currently no possibility to perform any cleanup action during shutdown when the program is closed via a signal (what is the default case during development) ?
As far as i played around, i could not find the solution to for the signal cases. But i might be missing here, i would like to hear from other maintainers and developers.
I found a very hacky workaround for performing cleanup calls at shutdown:
But it basically means one needs to recreate all publishers/clients etc. if one needs to perform any cleanup ros calls
try:
while rclpy.ok():
time.sleep(1.0)
except KeyboardInterrupt:
pass
# This part is executed after the keyboard interrupt was caught. The default context is now destroyed and can not be recreated.
new_context = rclpy.Context()
rclpy.init(context=new_context)
new_node = rclpy.create_node("test", context=new_context)
# Use the new new_node to perform cleanup ros calls
EDIT: Just to clarify: This does not solve on_shutdown
not being called in signals.
I'm going to mark this as help wanted for now.
I think we need a much bigger discussion on signal handling across the ROS 2 core repos.
@audrow Sounds good. Is there already an open issue for that I could subscribe to ?
@firesurfer not that i am aware of...
there are several issues and PRs are opened,
launch
)ros2run
CLI)besides those,
rclcpp
, rclpy
(this issue) and rosbag2
have the signal handler support by their own implementation.
I think we need a much bigger discussion on signal handling across the ROS 2 core repos.
So i this this means that we need to consider current implementation and problems, and discussion for redesign for ROS 2.
Hi there,
I am doing something else quite hacky, to be able to log messages to rousout after the SIGINT rclpy humble. The idea cam from recognizing, that the installed signalhandlers in rclpy's implementation are shutting down the current context before I was able to handle the sigint/keyboardInterrupt myself:
"""Implementation of a Minimal Example Node."""
import traceback
import rclpy
import rclpy.executors
import rclpy.logging
import rclpy.node
from example_interfaces.srv import AddTwoInts
from rclpy.signals import SignalHandlerOptions
from std_msgs.msg import String
LOGGER = rclpy.logging.get_logger(__file__)
class MinimalExampleNode(rclpy.node.Node):
"""
Implement a Minimal Example Node, that showcases how to log after SIGINT.
Based on example tutorial code from ros2 doc. MinimalPublisher and
MinimalService.
"""
def __init__(self) -> None:
super().__init__('minimal_publisher')
self.publisher_ = self.create_publisher(String, 'topic', 10)
self.i = 0
self.get_logger().info('init completed')
# commenting the timer here showcases the fact, that the SIGINT will
# only be processed, when a service request send to the node,
# triggering the python callback.
self.timer = self.create_timer(
0.5, # seconds
self.timer_callback,
)
self.create_service(
AddTwoInts,
'add',
self.add
)
def add(self, request, response): # noqa: ANN001, ANN201, D102
self.i = request.a + request.b
response.sum = self.i
return response
def timer_callback(self) -> None: # noqa: D102
msg = String()
msg.data = 'Hello World: %d' % self.i
self.publisher_.publish(msg)
self.get_logger().info('Publishing: "%s"' % msg.data)
self.i += 1
self.get_logger().info('self.i: %i' % self.i)
if self.i > 2:
# This will trigger the catch all exceptions block.
msg = 'smthing bad could happen if i is above 2!'
raise RuntimeError(msg)
def main() -> None:
"""Run Node in the ROS2 Network and log errors."""
try:
ros_context = rclpy.get_default_context()
rclpy.init(
context=ros_context,
signal_handler_options=SignalHandlerOptions.NO
)
executor = rclpy.executors.SingleThreadedExecutor(
context=ros_context
)
node = MinimalExampleNode()
executor.add_node(node)
executor.spin()
except KeyboardInterrupt:
node.get_logger().fatal(
'got interrupted by a KeyboardInterrupt and will'
' shutdown now...'
)
except Exception as error: # pragma: no cover
# catch all exceptions to log error and traceback before exiting...
err_msg = f'experienced an exception:\n{traceback.format_exc()}\n'
'will shutdown now.'
try:
node.get_logger().fatal(err_msg)
except Exception: # incase the node is no longer there.
rclpy.logging.get_logger(__file__).fatal(err_msg)
# this is not published, useful fallback to get it in the
# roslogging file.
raise error
finally:
rclpy.try_shutdown()
if __name__ == '__main__': # pragma: no cover
main()
A caveat with this implementation is, that the signal is only processed in the client code, when a python callback is processed. Hence it only works with nodes, that regularly process python callbacks, e.g. from a timer. In one node, that does not rely on a timer i added an 'empty' callback in the following fashion:
self.create_timer(.1, (lambda: None)) # used to get a raised KeyboardInterrupt recognized and handled
This pattern of not letting the rclpy's handle the signals is probably not the best idea, but I am not entirely sure how bad it is or what bad things could happen, as I make sure to manually call the rclpy.try_shutdown()
in the final statement.
So any improvement suggests or information on why its a bad idea or how stupid the idea is, are highly appreciated.
From my perspective a on_shutdown callback, which is guaranteed to run, before the current context is invalided, for what ever reason, would be the 'best' solution for the task of gracefully execute cleanup code that involves communicating over the ROS2 network.
Bug report
It looks like the callback that can be registered via
context.on_shutdown
is never called.Required Info:
Steps to reproduce issue
Register a callback to on_shutdown and then try various methods of exiting the node
I tried sending various signals: SIGINT, SIGTERM, SIGQUIT
Expected behavior
exit
should be printed twiceActual behavior
Prints nothing
Additional information
I also tried calling
try_shutdown
:But this didn't result into
on_shutdown
being called either.If one tries to explicitly shutdown the context:
The following error is thrown:
failed to shutdown: rcl_shutdown already called on the given context
Related Issues
This issue is probably related to. But according to the issues I found
on_shutdown
should be at least called if the node is properly exited (or if try_shutdown is called?)https://github.com/ros2/rclpy/issues/532 https://github.com/ros2/rclpy/issues/1077 <- This is actually what I am looking for as I do want to do some cleanup