ros-realtime / community

WG governance model & list of projects
27 stars 2 forks source link

Threaded Callback with priority, affinity and overrun handler #4

Closed dejanpan closed 3 years ago

dejanpan commented 4 years ago

ISP worked on the prototype of setting the rclcpp and dds thread priorities and affinities: https://discourse.ros.org/t/threaded-callback-with-priority-affinity-and-overrun-handler/14977.

They also have a POC: https://github.com/y-okumura-isp/ROS2_ThreadedCallback

Acceptance Criteria

  1. [ ] Test and decide if their POC should be implemented in rclcpp.
carlossvg commented 4 years ago

Overview

Callback-groups comparison

Using callback-groups seems to be a better approach. The described problem would be solved in the following way. Instead of adding a node to an executor, you would map tasks A, B and C to different callback-groups. Then, you would create different executors for each callback-group and create different threads for each executor. Each thread would be configured (priority, sched policy, CPU affinity) according to the callback-group requirements. F

Comments

I would suggest them to reach Ralph Lange and discuss/compare both implementations. Then, decide if callback-groups is the solution for the described problem or whether something is missing. See if it makes sense for them to join efforts, for example, to develop a demonstrator to show the priority inversion problem and how it is solved (if this doesn't exist already).

y-okumura-isp commented 4 years ago

Thank you for your comment. I have read docs and codes. This is a great job!

First, this is my understanding:

The goal is to avoid priority inversion issues

Yes. But what we want to say is the executor's priority. Our motivations are the following:

So we have separated threads of callbacks and an executor. Also, we use one executor to make the executor focus on event detections and event triggers. We think callback-groups tackles these items as the following.

I feel callback-groups is the right way. The remaining difference seems whether to protect the executor.

By the way, we have a few questions about RealTimeClass.

(update 2020/09/29)

An overrun handling mechanism already exists in ROS2 when using DDS, deadline QoS. However, this would be only for DDS topics.

We are interested in non-DDS events as you said. But deadline QoS is crucial in DDS(so, we may have to add metrics about deadline QoS in performance measurement scenario #6 )

ralph-lange commented 4 years ago

You’re right, the Callback-group-level Executor provides a more fine-grained API allowing to assign different callback groups of the same node to different Executors, but it does not provide any built-in mechanisms for mapping the threads of Executors to the scheduling mechanisms of the underlying operating system. This was an explicit design decision to be independent of the underlying OS and scheduler.

The RealTimeClass was introduced in the prototype at cbg-executor-0.6.1 to allow the developer of a node to give a "hint" on the priorities inside his/her node to the person who later integrates the node into a whole system and who has to map the callback groups of multiple nodes to Executors, threads and processes. The idea of the Meta Executor concept was to perform this mapping automatically by a configuration file. However, the Meta Executor concept was never implemented. And the RealTimeClass did not make it into mainline rclcpp (for good reasons).

Regarding bad behavior of callbacks: ROS 2 (just as classical ROS) assumes run-to-completion of callbacks, which avoids the need for synchronization between callbacks of the same callback group. (Use rclcpp’s CallbackGroupType Reentrant to allow for parallel execution of callbacks within a callback group.) Aborting long-running callbacks could leave the node in an inconsistent state. I propose to consider such events as a node-level error and restart the whole node instead.

Small remark on FIFO order: While classical ROS basically processed the message and timer events in FIFO order (with exception of buffer overruns), the current rclcpp Executor of ROS 2 can be described as combination of fixed priority scheduling and round robin. Please see https://doi.org/10.4230/LIPIcs.ECRTS.2019.6 for details.

y-okumura-isp commented 4 years ago

Thank you for comments. We think ThreadedCallback is very flexible, but we haven't found a use case that makes it superior to CallbackGroup. We would also like to explore the lock use case in ThreadedCallback(described below). So, please dismiss it for now. We'd be happy to continue the discussion (and if it improves CallbackGroup, that would be great).

We've been thinking about these things, but any ideas are welcome.

(1) How does CallbackGroup try to analyze the order or performance of callback execution? With ThreadedCallback, we thought that we could use the existing tools since callback is an OS thread in ThreadedCallback. (2) With ThreadedCallback, we thought that the Executor would be able to detect events as soon as topics and timers fire, even if callbacks are dropped. We hope we can provide enough traceability information for development, debugging, or verification from executor(or entire ROS2) to ROS2 user. (3) ThreadedCallback requires lock in timer_driven situations (e.g., Reader-Writer lock), and we were wondering if we can use some lock-free mechanism such as RCU(Read-Copy-Update). Is there a method that could be used for CallbackGroup to share multiple CallbackGroups? (or is there such a use case?)

(3.1) RW lock in ThreadedCallback
              copy               read
  Subscriber ------> Node::data ------> TimerCallbackThread

(3.2) RW lock in CallbackGroup?
        Different CBG read the same data?
       (of course, CBG2 should also have a subscription, but we want to avoid extra data copy, for example)

                    copy          read
  CBG1  Subscriber ------> data -------> TimerCallback1 : not need lock because of run-to-completion
                           | 
                           |     read
  CBG2                     +-----------> TimerCallback2 : if data is read by other CBG, we may need lock
carlossvg commented 3 years ago

@y-okumura-isp Based on your conclusion in the previous comment we are going to close this issue.