ros-safety / software_watchdogs

A library of (software) watchdogs based on DDS Quality of Service (QoS) policies and ROS 2 lifecycle nodes.
Apache License 2.0
71 stars 17 forks source link

SW Watchdog

License Build status

A library of (software) watchdogs based on DDS Quality of Service (QoS) policies and ROS 2 lifecycle nodes.

This package includes a heartbeat node that can be added easily to an existing process via ROS 2 node composition.

Overview

This package includes two watchdog implementations ("readers") and a heartbeat node (a "writer"). A watchdog expects a heartbeat signal at the specified frequency and otherwise declares the writer to have failed. A failure results in the watchdog's life cycle state machine to transition to the Inactive state along with emitting the corresponding state transition event. A system-level response can be implemented in the event handler to realize patterns such as cold standby, process restarts, etc.

Usage

The launch files included in this package demonstrate both node composition with a heartbeat signal and the configuration of a corresponding watchdog.

If you wish to use an rmwimplementation other than the default, set the RMW_IMPLEMENTATION environment variable appropriately in all shells that you are using ROS in.

Then start the heartbeat and watchdog examples in separate terminals:

ros2 launch sw_watchdog heartbeat_composition.launch.py
ros2 launch sw_watchdog watchdog_lifecycle.launch.py

The first command composes a single process consisting of a ROS 2 demo_nodes_cpp::Talker with a SimpleHeartbeat set at 200ms. The second command starts a SimpleWatchdog which grants a lease of 220ms to the Heartbeat publisher. The watchdog will transition to the Inactive state as soon as the Heartbeat publisher violates the lease (e.g., via CTRL+C in the first terminal). Since the watchdog is a lifecycle node, it can be re-activated to listen for a Heartbeat signal via:

ros2 lifecycle set simple_watchdog activate

To test the WindowedWatchdog replace the launch command in the second terminal with:

ros2 launch sw_watchdog windowed_watchdog_lifecycle.launch.py

This grants the Heartbeat publisher a maximum of three deadline misses. Deadline misses can be tested by inserting artificial delays in the publishing thread, for example.

It is important that compatible lease times are configured for the Heartbeat signal and the watchdog. DDS does not establish a connection when incompatible QoS times are chosen (Cyclone and Connext DDS additionally display a warning message when this is the case):

Requirements

This package includes custom messages. If you are compiling it from source and wish to use a non-default rmw implementation, you must have the appropriate rmw packages installed when you compile this package. See Install DDS implementations for more information on installing alternative rmw implementations.

To use the heartbeat_composition.launch.py example, the ros-*-demo-nodes-cpp must be installed.

Compatibility

This code is built and tested under:

The following DDS rmw implementations were tested in both environments (via the default Ubuntu packages that ship with the Rolling releases):

TODO

The Heartbeat message defined in this package supports the notion of checkpoints. Future watchdog versions could add support for control flow monitoring based on this information.

Contact

For any questions or comments, please post a question at ROS Answers following the ROS support guidelines. Add the safety_wg tag to your question and someone from the Safety working group will spot it more easily.