heftig / rtkit

Branched from git://git.0pointer.net/rtkit
Other
44 stars 20 forks source link

REALTIMEKIT Realtime Policy and Watchdog Daemon

GIT: https://github.com/heftig/rtkit

NOTES: RealtimeKit is a D-Bus system service that changes the scheduling policy of user processes/threads to SCHED_RR (i.e. realtime scheduling mode) on request. It is intended to be used as a secure mechanism to allow real-time scheduling to be used by normal user processes.

    RealtimeKit enforces strict policies when handing out
    real-time security to user threads:

    * Only clients with RLIMIT_RTTIME set will get RT scheduling

    * RT scheduling will only be handed out to processes with
      SCHED_RESET_ON_FORK set to guarantee that the scheduling
      settings cannot 'leak' to child processes, thus making sure
      that 'RT fork bombs' cannot be used to bypass RLIMIT_RTTIME
      and take the system down.

    * Limits are enforced on all user controllable resources, only
      a maximum number of users, processes, threads can request RT
      scheduling at the same time.

    * Only a limited number of threads may be made RT in a
      specific time frame.

    * Client authorization is verified with PolicyKit

    RealtimeKit can also be used to hand outh high priority
    scheduling (i.e. negative nice level) to user processes.

    In addition to this a-priori policy enforcement, RealtimeKit
    also provides a-posteriori policy enforcement, i.e. it
    includes a canary-based watchdog that automatically demotes
    all real-time threads to SCHED_OTHER should the system
    overload despite the logic pointed out above. For more
    information regarding canary-based RT watchdogs, see the
    Acknowledgments section below.

    In its duty to manage real-time scheduling *securely*
    RealtimeKit runs as unpriviliged user, and uses capabalities,
    resource limits and chroot() to minimize its security impact.

    RealtimeKit probably has little use in embedded or server use
    cases, use RLIMIT_RTPRIO there instead.

WHY: If processes that have real-time scheduling privileges enter a busy loop they can freeze the entire the system. To make sure such run-away processes cannot do this RLIMIT_RTTIME has been introduced. Being a per-process limit it is however easily cirumvented by combining a fork bomb with a busy loop.

    RealtimeKit hands out RT scheduling to specific threads that
    ask for it -- but only to those and due to SCHED_RESET_ON_FORK
    it can be sure that this won't 'leak'.

    In contrast to RLIMIT_RTPRIO the RealtimeKit logic makes sure
    that only a certain number of threads can be made realtime,
    per user, per process and per time interval.

CLIENTS: To be able to make use of realtime scheduling clients may request so with a small D-Bus interface that is accessible on the interface org.freedesktop.RealtimeKit1 as object /org/freedesktop/RealtimeKit1 on the service org.freedesktop.RealtimeKit1:

            void MakeThreadRealtime(u64 thread_id, u32 priority);

            void MakeThreadHighPriority(u64 thread_id, s32 priority);

    The thread IDs need to be passed as kernel tids as returned by
    gettid(), not a pthread_t!  (Please
    note that gettid() is not available in glibc, you need to
    implement that manually using syscall(). Consult the reference
    client implementation for details.)

    It is possible to promote thread in process to realtime/high
    priority from another process, that will make the DBUS call,
    using:

            void MakeThreadRealtimeWithPID(u64 process, u64 thread_id, u32 priority);

            void MakeThreadHighPriorityWithPID(u64 process, u64 thread_id, s32 priority);

    where process is the PID of the process that has thread thread_id.

    A BSD-licensed reference implementation of the client is
    available in rtkit.[ch] as part of the package. You may copy
    this into your sources if you wish. However given how simple
    the D-Bus interface is you might choose to implement your own
    client implementation.

    It is advisable to try acquiring realtime scheduling with
    sched_setsheduler() first, so that systems where RLIMIT_RTPRIO
    is set can be supported.

    Here's an example using the reference implementation. Replace
    this:

    <snip>
            struct sched_param p;
            memset(&p, 0, sizeof(p));
            p.sched_priority = 3;
            sched_setscheduler(0, SCHED_RR|SCHED_RESET_ON_FORK, &p);
    </snip>

    by this:

    <snip>
            struct sched_param p;
            memset(&p, 0, sizeof(p));
            p.sched_priority = 3;
            if (sched_setscheduler(0, SCHED_RR|SCHED_RESET_ON_FORK, &p) < 0
                    && errno == EPERM)
                    rtkit_make_realtime(system_bus, 0, p.sched_priority);
    </snip>

    But of course add more appropriate error checking! Also,
    falling back to plain SCHED_RR when SCHED_RESET_ON_FORK causes
    EINVAL migt be advisable).

DAEMON:

    The daemon is automatically started on first use via D-Bus
    system bus activation.

    Currently the daemon does not read on any configuration file,
    however it can be configured with command line parameters. You
    can edit

    /usr/share/dbus-1/system-services/org.freedesktop.RealtimeKit1.service

    to set those.

    Run

    /usr/libexec/rtkit-daemon --help

    to get a quick overview on the supported parameters and their
    defaults. Many of them should be obvious in their meaning. For
    the remaining ones see below:

    --max-realtime-priority= may be used to specify the maximum
    realtime priority a client can acquire through
    RealtimeKit. Please note that this value must be smaller than
    the value passed to --our-realtime-priority=.

    --our-realtime-priority= may be used to specify the realtime
    priority of the daemon itself. Please note that this priority
    is only used for a very short time while processing a client
    request. Normally the daemon will not be running with a
    realtime scheduling policy. The real-time priorities handed
    out to the user must be lower than this value. (see above).

    --min-nice-level= may be used to specify the minimum nice
    level a client can acquire through RealtimeKit.

    --our-nice-level= may be used to specify the nice level the
    the daemon itself uses most of the time (except when
    processing requests, see above). It is probably a good idea to
    set this to a small positive value, to make sure that if the
    system is overloaded already handing out further RT scheduling
    will be delayed a bit.

    --rttime-usec-max= may be used to control which RLIMIT_RTTIME
    value clients need to have chosen at minimum before they may
    acquire RT scheduling through RealtimeKit.

    --users-max= specifies how many users may acquire RT
    scheduling at the same time for one or multiple of their
    processes.

    --processes-per-user-max= specifies how many processes per
    user may acquire RT scheduling at the same time.

    --threads-per-user-max= specifies how many threads per user
    may acquire RT scheduling at the same time. Of course this
    value should be set higher than --process-per-user-max=.

    --actions-burst-sec= may be used to influence the rate
    limiting logic in RealtimeKit. The daemon will only pass
    realtime scheduling privileges to a maximum number of threads
    within this timeframe (see below).

    --actions-per-burst-max= may be used to influence the rate
    limiting logic in RealtimeKit. The daemon will only pass
    realtime scheduling privileges to this number of threads
    within the time frame configured via
    --actions-burst-sec=. When this limit is reached clients need
    to wait until that time passes before requesting RT scheduling
    again.

    --canary-cheep-msec= may be used to control how often the
    canary thread shall cheep.

    --canary-watchdog-msec= may be used to control how quickly the
    watchdog thread expects to receive a cheep from the canary
    thread. This value must be chosen larger than
    --canary-cheep-msec=. If the former is set 10s and the latter
    to 7s, then the canary thread can trigger and deliver the
    cheep with a maximum latency of 3s.

ACKNOWLEDGMENTS: The canary watchdog logic is inspired by previous work of Vernon Mauery, Florian Schmidt, Kjetil Matheussen:

    http://rt.wiki.kernel.org/index.php/RT_Watchdog

LICENSE: GPLv3+ for the daemon BSD for the client reference implementation

AUTHOR: Lennart Poettering

REQUIREMENTS: Linux kernel >= 2.6.31 D-Bus PolicyKit >= 0.92

OPTIONAL DEPENDENCIES: libsystemd - to let rtkit talk to systemd using the sd-daemon API