Open diegoferigo opened 7 years ago
@lornat75 @marcoaccame @randaz81 @pattacini @traversaro @francesco-romano
Hi all, I have some more info about previous history of real time threads in robotInterface (around 2014). Let me search the related redmine issues. I will soon post a brief on that.
Here is some more history.
Note-1
for my comments on these priorities.
I recall that @apaikan and I worked with RT another time, in 2015, to reduce the CPU load of old PC104.
See Appendix
for details of both activities.Now, I recall that in the time of old PC104 the RT mode was required to mitigate packet loss. So, why now ICUB_USE_REALTIME_LINUX is off and we dont see the problems of years ago?
I think the reason is in point (5): now the system has much more computation capabilities and the problem does not (statistically?) shows.
However, I think that the improved computation capabilities are not enough to guarantee a safe system.
And here is a proposal for a way forward:
Note-2
are some thoughts on priorities.Note-1
I don't recall the reason of numbers 33, 48 and 49. I remember only that originally they were 33, 49 and 49 and later we moved one 49 into 48. Maybe @apaikan remembers why.
Note-2
If I should define priorities now I would choose values in the range [50, 99], so that they sit between the system IRQs which maintain the highest priorities and the remaining common user threads. Moreover, I would assign highest priority to the EthSender (it normally executes for shorter time, plus we really want to deliver commands to the motors ASAP). Then, as second, I would put the EthReceiver (because it parses UDP packets, unlocks blocking requests and fills all data coming from the ETH boards inside the devices - embObjStrain, embObjMotionControl, etc. -, data which is normally read at 10 ms rate). Finally, as third I would put all other threads in yarprobotinterface (because I assume they just deal with port communication ... but see my Question-1
).
Question-1
What are the other threads in yarprobotinteface? Those associated to the wrappers?
Here are details of past activities reported in redmine.
In https://redmine.robotology.eu/issues/87 it is fist mentioned that in Oct 2014 @apaikan introduced a variable to toggle on /off realtime kernel with FIFO scheduling and suitable priorities in threads of robotInterface
.
I tested that feature for the sake of solving the problem of packet loss in PC104.
For those of you who don't have access to redmine, here are some excerpts from the issue:
Marco Accame 13 Oct 2014 07:31 PM
The result of these tests is that the packet loss happens in the PC104 and is due to
incorrect scheduling of the receiving thread and to small size of the buffer size.
By applying suitable small changes the existing kernel setup and in the code of
robotInterface, the problem seems to be solved.
The brief of all the changes which solve the problem is in the following.
In system:
1. Use of a realtime kernel with FIFO scheduling
2. Increased max size of UDP receiving buffer to 8MB: added line "net.core.rmem_max=8388608"
in file /etc/sysctl.conf
3. Given to icub user the right to run high priority threads: added the two lines
"icub soft rtprio 99" and "icub hard rtprio 99" in file /etc/security/limits.conf
In the code of robotInterface:
4. priority of robotInterface process set to 33 (see main.cpp),
5. priority of receiving thread equal to 49 (see ethManager.cpp, EthSender::threadInit()),
6. priority of transmitting thread equal to 48 (see ethManager.cpp, EthReceiver::threadInit()),
7. buffer size of receiving thread equal to 1MB (see ethmanager.cpp, EthReceiver::config())
Marco Accame 03 Nov 2014 11:55 AM
With the above changes and after continuous use on iCubGenova04 the problem has not
shown anymore.
The above changes related to robotIntenface (points 4 to 7) are already in icub-main.
The changes related to system (points 1 to 3) are issue of another task dealing with
preparation of a new live linux key.
Can anybody working on this task please add a link?
NOTE1
: the iCubGenova04 of end 2014 has now become iCubNancy01.
NOTE2
: The system settings of points 1 to 3 are now in use in our robots.
Moreover, in April 2015 I did some more work with @apaikan focussed to mitigate the CPU load of the old PC104.
See https://redmine.robotology.eu/issues/561. We did many tests with the RT mode turned ON. It was decided:
We tested on old iCubGenova04, upper part of a new iCubGenova02, iCubHeidelberg, and later on we deployed on all other ETH robots.
I agree we should enable realtime support for the software running on the head pc and raise the priority for the ethManager, also making sure it is always on by default and in future installations of the robot.
Note: This post was edited after @apaikan comment below.
Thank you @marcoaccame for this detailed report, well done. Following especially your Note-1
and Note-2
, this is the priorities logic I propose:
| 0 ------------------ 19 | 20 ------------------- 39 | 40 ------- 49 | 50 |
Threads that should Threads containing Critic Kernel
run on top of user functions with strict general IRQs
threads but w/o RT constraints threads
strict RT constraints
Please let me know what's your opinion about.
cc @marcoaccame @drdanz @lornat75 @randaz81 @traversaro @francesco-romano @pattacini @barbalberto
Hi @diegoferigo, it seems fine by me.
@diegoferigo Please notice that the Ethernet controller thread in RT Linux has priority level of 50.
When talking about FIFO scheduling, higher number has actually higher priority value. So increasing the application thread's priority higher than 50 can preempt Network thread and may result higher latency in general! This is the main reason why the yarprobotinterface
interface threads has priority (33) less than 50!
Reminder from @marcoaccame :
In particular: all robotInterface threads had scheduling policy SCHED_FIFO. All had priority 33, apart from
thread EthSender (48) and EthReceiver (49). See Note-1 for my comments on these priorities.
I recall that @apaikan and I worked with RT another time, in 2015, to reduce the CPU load of old PC104.
Hi @apaikan and @diegoferigo,
two things in parallel:
About 1: I simply don't know.
About 2: see also issue #1233. My opinion is that from highest to lowest priorities:
(it normally executes for shorter time, plus we really want to deliver commands to the motors ASAP)
.(because it parses UDP packets, unlocks blocking requests and fills all data coming from the ETH boards inside the devices - embObjStrain, embObjMotionControl, etc. -, data which is normally read at 10 ms rate).
.Finally, as third I would put all other threads in yarprobotinterface (because I assume they just deal with port communication ... but see my Question-1).
@apaikan I'm seeing a big misconception here, apparently there is a big confusion about how these ranges work, and it looks like you're right. The two scales use an opposite logic, and this invalidates the scale of the first post of this issue. I'll edit the posts accordingly.
Thanks for your intervention, it was really helpful!
Since you're here, did you choose FIFO
over RR
for any reason?
I suggest we start by rising priorities only of ethReceiver
and ethSender
. Changing priorities for other threads may have hidden side-effects, to it needs to be one with care.
I updated the first post, I hope now the situation is more clear to everyone.
@lornat75 After the @apaikan comments, it seems that the current situation of priorities already looks good and it is compliant of the ranges I envisioned.
During the last few days, the flag ICUB_USE_REALTIME_LINUX
has been set to ON
in the iCubGenova04
and all worked flawlessly.
The current status of real time thread support in
yarp
is still in experimental status and hence undocumented. This could be a useful feature for critic threads that run intoicub-head
, and could potentially mitigate situations like https://github.com/robotology/icub-support/issues/449.Before starting, some bit of history, at least what I found so far.
Some year ago
yarp
gained the experimental support of tweaking the priorities of important threads (C++11 support, later tweak). BeyondPortCore
, this feature is used in yarprobotinterface and ethManager through theICUB_USE_REALTIME_LINUX
experimental CMake option (default toOFF
). These latter support is unix specific, and this could be the reason of the default value of the mentioned option. More generally, all the classes that inherit fromThreadImpl
have now a multiplatformsetPriority()
method.I have some comments about:
static int setPriority()
method inThreadImpl
that can be called also inethManager
in order to provide multiplatform support and to avoid breaking builds on non-unix machines whenICUB_USE_REALTIME_LINUX=1
ethManager
) should be enabled by default if unix is detectedFor future reference (and to clear my mind), I recap below what I understood about the policy of unix priorities.
There are two kind of tasks, each linked to a different priority range:
SCHED_FIFO
SCHED_RR
SCHED_OTHER
SCHED_NORMAL
RTPRIO=[1, 99]
NICENESS=[-20;19]
RTPRIO
means high priorityNICENESS
means low priorityOne important detail that is worth noting is that internally the kernel handles the queue with a certain logic, and the userspace tools such as
top
show a different range. Yes, it is confusing, and you can find funny discussion in the linux-rt ml. This document and<include/linux/sched/prio.h>
have been really helpful.Here below I'll focus of the
PRI
field you can obtain fromps -eo pid,rtprio,ni,pri,comm
(checkNote#1
at the end of the post). To my understanding, the general priorityPRI=[0:139]
of a thread can be defined as follows:considering:
The range generated by this equation is the following:
Note:
PRI=40
is not allowed. SeeNote#1
below.Notes
Note#1
Something is still not clear, and I hope someone sooner or later could shine light on this.
man ps
reports at its bottom an explanation of the fields that the UNIX-style option-o
can produce. In particular:But, checking at
ps -eo pid,rtprio,ni,pri,comm
, that column is orderered[0, 139]
with low values = low priority logic. You can check yourself grepping the irqs, pulseaudio, etc, that have a biggerPRI
. Apparently I'm not the only one with doubts.References:
Overview of CPU scheduling A complete guide to Linux process scheduling Real-Time Linux Kernel Scheduler RT Preempt HowTO IRQs in RT kernels