Threaded Callback with priority, affinity and overrun handler

dejanpan commented 4 years ago

ISP worked on the prototype of setting the rclcpp and dds thread priorities and affinities: https://discourse.ros.org/t/threaded-callback-with-priority-affinity-and-overrun-handler/14977.

They also have a POC: https://github.com/y-okumura-isp/ROS2_ThreadedCallback

Acceptance Criteria

[ ] Test and decide if their POC should be implemented in rclcpp.

carlossvg commented 4 years ago

Overview

The thread priority could affect the jitter of the ROS 2 callbacks timing
The prototype proposes to create threads per callback to allow more granularity when configuring the threads real-time settings
The ISP proposal allows running callbacks in a specific thread with its own settings.
The goal is to avoid priority inversion issues
The proposal implements an additional overrun callback mechanism to handle missed deadlines.

Callback-groups comparison

Callback groups info:
- https://micro-ros.github.io/docs/concepts/client_library/real-time_executor/#callback-group-level-executor
- https://vimeo.com/292707644
- https://github.com/boschresearch/ros2_examples/tree/cbg-executor-foxy/rclcpp/cbg-executor_ping-pong => Bosch POC
- https://github.com/ros2/rclcpp/issues/519, https://github.com/ros2/rclcpp/pull/1218 => already merged in Foxy
- https://github.com/ros2/rclcpp/issues/1287 => recent refactoring

Using callback-groups seems to be a better approach. The described problem would be solved in the following way. Instead of adding a node to an executor, you would map tasks A, B and C to different callback-groups. Then, you would create different executors for each callback-group and create different threads for each executor. Each thread would be configured (priority, sched policy, CPU affinity) according to the callback-group requirements. F

Comments

This mixes tasks and threading/execution. This seems to go against the executor goal, which is decoupling both things.
To allow per-topic prioritization a mechanism already exists in Foxy, callback-groups.
An overrun handling mechanism already exists in ROS2 when using DDS, deadline QoS. However, this would be only for DDS topics.
- How to do this for non-DDS implementations?
- How to do this for timers?

I would suggest them to reach Ralph Lange and discuss/compare both implementations. Then, decide if callback-groups is the solution for the described problem or whether something is missing. See if it makes sense for them to join efforts, for example, to develop a demonstrator to show the priority inversion problem and how it is solved (if this doesn't exist already).

y-okumura-isp commented 4 years ago

Thank you for your comment. I have read docs and codes. This is a great job!

First, this is my understanding:

cbg-executor-0.6.1
- Add Callback-groups to executors instead of Node.
- RealTimeClass is introduced.
cbg_executor_ping_pong
- According to main.cpp, we have to create threads for executors, and setup thread properties such as priority. In other words, thread properties and RealTimeClass are orthogonal concepts.
- "Meta Executor Concept" may hide thread creation or setting.
roc2/rclcpp#1218
- callback groups improvement.

The goal is to avoid priority inversion issues

Yes. But what we want to say is the executor's priority. Our motivations are the following:

(1) We think it's good to protect the executor from callback's bad behavior such as overrun, unexcepted long loop or function call, and so on.
(2) Additionally, as executors do a kind of "scheduling", they would be better to run independently of callbacks. For example, they would trigger a timer event when another callback is running.
(3) We tried to make developers use OS scheduling mechanism directly to control callbacks fully (thus "This mixes tasks and threading/execution")

So we have separated threads of callbacks and an executor. Also, we use one executor to make the executor focus on event detections and event triggers. We think callback-groups tackles these items as the following.

(1) the executor is not protected, but we have multiple executors(so can we limit the affected areas?).
- a little question: how about executor overhead if we need many priorities
(2) callback-groups partitions relevant callbacks. Irrelevant or lower priority callback does not prevent higher priority callbacks.
(3) In the same callback group, callbacks are executed in FIFO like order (same as SingleThreadedExecutor). It would be similar to using multiple same priority FIFO threads in terms of scheduling. But callback-groups looks to be more than it because it handles locks and Mutually-Executable and so on.

I feel callback-groups is the right way. The remaining difference seems whether to protect the executor.

By the way, we have a few questions about RealTimeClass.

(1) In cbg-executor-0.6.1, it looks callback-groups are prioritized by thread priority but not by RealTimeClass. Additionally, an executor can have multiple RealTimeClass callback-groups. What's happens if we add ReadTimeCritical and BestEffort callback-group to an executor, and callbacks are triggered simultaneously? (I think it's an extreme example, but I couldn't find related codes
(2) In roc2/rclcpp#1218, we see CallbackGroup improvements, but cannot find RealTimeClass. Is this a future work?

(update 2020/09/29)

An overrun handling mechanism already exists in ROS2 when using DDS, deadline QoS. However, this would be only for DDS topics.

We are interested in non-DDS events as you said. But deadline QoS is crucial in DDS(so, we may have to add metrics about deadline QoS in performance measurement scenario #6 )

ralph-lange commented 4 years ago

You’re right, the Callback-group-level Executor provides a more fine-grained API allowing to assign different callback groups of the same node to different Executors, but it does not provide any built-in mechanisms for mapping the threads of Executors to the scheduling mechanisms of the underlying operating system. This was an explicit design decision to be independent of the underlying OS and scheduler.

The RealTimeClass was introduced in the prototype at cbg-executor-0.6.1 to allow the developer of a node to give a "hint" on the priorities inside his/her node to the person who later integrates the node into a whole system and who has to map the callback groups of multiple nodes to Executors, threads and processes. The idea of the Meta Executor concept was to perform this mapping automatically by a configuration file. However, the Meta Executor concept was never implemented. And the RealTimeClass did not make it into mainline rclcpp (for good reasons).

Regarding bad behavior of callbacks: ROS 2 (just as classical ROS) assumes run-to-completion of callbacks, which avoids the need for synchronization between callbacks of the same callback group. (Use rclcpp’s CallbackGroupType Reentrant to allow for parallel execution of callbacks within a callback group.) Aborting long-running callbacks could leave the node in an inconsistent state. I propose to consider such events as a node-level error and restart the whole node instead.

Small remark on FIFO order: While classical ROS basically processed the message and timer events in FIFO order (with exception of buffer overruns), the current rclcpp Executor of ROS 2 can be described as combination of fixed priority scheduling and round robin. Please see https://doi.org/10.4230/LIPIcs.ECRTS.2019.6 for details.

y-okumura-isp commented 4 years ago

Thank you for comments. We think ThreadedCallback is very flexible, but we haven't found a use case that makes it superior to CallbackGroup. We would also like to explore the lock use case in ThreadedCallback(described below). So, please dismiss it for now. We'd be happy to continue the discussion (and if it improves CallbackGroup, that would be great).

We've been thinking about these things, but any ideas are welcome.

(1) How does CallbackGroup try to analyze the order or performance of callback execution? With ThreadedCallback, we thought that we could use the existing tools since callback is an OS thread in ThreadedCallback. (2) With ThreadedCallback, we thought that the Executor would be able to detect events as soon as topics and timers fire, even if callbacks are dropped. We hope we can provide enough traceability information for development, debugging, or verification from executor(or entire ROS2) to ROS2 user. (3) ThreadedCallback requires lock in timer_driven situations (e.g., Reader-Writer lock), and we were wondering if we can use some lock-free mechanism such as RCU(Read-Copy-Update). Is there a method that could be used for CallbackGroup to share multiple CallbackGroups? (or is there such a use case?)

(3.1) RW lock in ThreadedCallback
              copy               read
  Subscriber ------> Node::data ------> TimerCallbackThread

(3.2) RW lock in CallbackGroup?
        Different CBG read the same data?
       (of course, CBG2 should also have a subscription, but we want to avoid extra data copy, for example)

                    copy          read
  CBG1  Subscriber ------> data -------> TimerCallback1 : not need lock because of run-to-completion
                           | 
                           |     read
  CBG2                     +-----------> TimerCallback2 : if data is read by other CBG, we may need lock

carlossvg commented 3 years ago

@y-okumura-isp Based on your conclusion in the previous comment we are going to close this issue.

ros-realtime / community