wutiejun / workspace

My workspace.
7 stars 3 forks source link

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/timers/NO_HZ.txt #50

Open wutiejun opened 7 years ago

wutiejun commented 7 years ago

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/timers/NO_HZ.txt

        NO_HZ: Reducing Scheduling-Clock Ticks

This document describes Kconfig options and boot parameters that can
reduce the number of scheduling-clock interrupts, thereby improving energy
efficiency and reducing OS jitter.  Reducing OS jitter is important for
some types of computationally intensive high-performance computing (HPC)
applications and for real-time applications.

There are three main ways of managing scheduling-clock interrupts
(also known as "scheduling-clock ticks" or simply "ticks"):

1.  Never omit scheduling-clock ticks (CONFIG_HZ_PERIODIC=y or
    CONFIG_NO_HZ=n for older kernels).  You normally will -not-
    want to choose this option.

2.  Omit scheduling-clock ticks on idle CPUs (CONFIG_NO_HZ_IDLE=y or
    CONFIG_NO_HZ=y for older kernels).  This is the most common
    approach, and should be the default.

3.  Omit scheduling-clock ticks on CPUs that are either idle or that
    have only one runnable task (CONFIG_NO_HZ_FULL=y).  Unless you
    are running realtime applications or certain types of HPC
    workloads, you will normally -not- want this option.

These three cases are described in the following three sections, followed
by a third section on RCU-specific considerations, a fourth section
discussing testing, and a fifth and final section listing known issues.

NEVER OMIT SCHEDULING-CLOCK TICKS

Very old versions of Linux from the 1990s and the very early 2000s
are incapable of omitting scheduling-clock ticks.  It turns out that
there are some situations where this old-school approach is still the
right approach, for example, in heavy workloads with lots of tasks
that use short bursts of CPU, where there are very frequent idle
periods, but where these idle periods are also quite short (tens or
hundreds of microseconds).  For these types of workloads, scheduling
clock interrupts will normally be delivered any way because there
will frequently be multiple runnable tasks per CPU.  In these cases,
attempting to turn off the scheduling clock interrupt will have no effect
other than increasing the overhead of switching to and from idle and
transitioning between user and kernel execution.

This mode of operation can be selected using CONFIG_HZ_PERIODIC=y (or
CONFIG_NO_HZ=n for older kernels).

However, if you are instead running a light workload with long idle
periods, failing to omit scheduling-clock interrupts will result in
excessive power consumption.  This is especially bad on battery-powered
devices, where it results in extremely short battery lifetimes.  If you
are running light workloads, you should therefore read the following
section.

In addition, if you are running either a real-time workload or an HPC
workload with short iterations, the scheduling-clock interrupts can
degrade your applications performance.  If this describes your workload,
you should read the following two sections.

OMIT SCHEDULING-CLOCK TICKS FOR IDLE CPUs

If a CPU is idle, there is little point in sending it a scheduling-clock
interrupt.  After all, the primary purpose of a scheduling-clock interrupt
is to force a busy CPU to shift its attention among multiple duties,
and an idle CPU has no duties to shift its attention among.

The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
scheduling-clock interrupts to idle CPUs, which is critically important
both to battery-powered devices and to highly virtualized mainframes.
A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would
drain its battery very quickly, easily 2-3 times as fast as would the
same device running a CONFIG_NO_HZ_IDLE=y kernel.  A mainframe running
1,500 OS instances might find that half of its CPU time was consumed by
unnecessary scheduling-clock interrupts.  In these situations, there
is strong motivation to avoid sending scheduling-clock interrupts to
idle CPUs.  That said, dyntick-idle mode is not free:

1.  It increases the number of instructions executed on the path
    to and from the idle loop.

2.  On many architectures, dyntick-idle mode also increases the
    number of expensive clock-reprogramming operations.

Therefore, systems with aggressive real-time response constraints often
run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
in order to avoid degrading from-idle transition latencies.

An idle CPU that is not receiving scheduling-clock interrupts is said to
be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
tickless".  The remainder of this document will use "dyntick-idle mode".

There is also a boot parameter "nohz=" that can be used to disable
dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
dyntick-idle mode.

OMIT SCHEDULING-CLOCK TICKS FOR CPUs WITH ONLY ONE RUNNABLE TASK

If a CPU has only one runnable task, there is little point in sending it
a scheduling-clock interrupt because there is no other task to switch to.
Note that omitting scheduling-clock ticks for CPUs with only one runnable
task implies also omitting them for idle CPUs.

The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
sending scheduling-clock interrupts to CPUs with a single runnable task,
and such CPUs are said to be "adaptive-ticks CPUs".  This is important
for applications with aggressive real-time response constraints because
it allows them to improve their worst-case response times by the maximum
duration of a scheduling-clock interrupt.  It is also important for
computationally intensive short-iteration workloads:  If any CPU is
delayed during a given iteration, all the other CPUs will be forced to
wait idle while the delayed CPU finishes.  Thus, the delay is multiplied
by one less than the number of CPUs.  In these situations, there is
again strong motivation to avoid sending scheduling-clock interrupts.

By default, no CPU will be an adaptive-ticks CPU.  The "nohz_full="
boot parameter specifies the adaptive-ticks CPUs.  For example,
"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
CPUs.  Note that you are prohibited from marking all of the CPUs as
adaptive-tick CPUs:  At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
user processes to observe slight drifts in clock rate.)  Therefore, the
boot CPU is prohibited from entering adaptive-ticks mode.  Specifying a
"nohz_full=" mask that includes the boot CPU will result in a boot-time
error message, and the boot CPU will be removed from the mask.  Note that
this means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.

Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies
that all CPUs other than the boot CPU are adaptive-ticks CPUs.  This
Kconfig parameter will be overridden by the "nohz_full=" boot parameter,
so that if both the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter and
the "nohz_full=1" boot parameter is specified, the boot parameter will
prevail so that only CPU 1 will be an adaptive-ticks CPU.

Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
This is covered in the "RCU IMPLICATIONS" section below.

Normally, a CPU remains in adaptive-ticks mode as long as possible.
In particular, transitioning to kernel mode does not automatically change
the mode.  Instead, the CPU will exit adaptive-ticks mode only if needed,
for example, if that CPU enqueues an RCU callback.

Just as with dyntick-idle mode, the benefits of adaptive-tick mode do
not come for free:

1.  CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run
    adaptive ticks without also running dyntick idle.  This dependency
    extends down into the implementation, so that all of the costs
    of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL.

2.  The user/kernel transitions are slightly more expensive due
    to the need to inform kernel subsystems (such as RCU) about
    the change in mode.

3.  POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
    Real-time applications needing to take actions based on CPU time
    consumption need to use other means of doing so.

4.  If there are more perf events pending than the hardware can
    accommodate, they are normally round-robined so as to collect
    all of them over time.  Adaptive-tick mode may prevent this
    round-robining from happening.  This will likely be fixed by
    preventing CPUs with large numbers of perf events pending from
    entering adaptive-tick mode.

5.  Scheduler statistics for adaptive-tick CPUs may be computed
    slightly differently than those for non-adaptive-tick CPUs.
    This might in turn perturb load-balancing of real-time tasks.

6.  The LB_BIAS scheduler feature is disabled by adaptive ticks.

Although improvements are expected over time, adaptive ticks is quite
useful for many types of real-time and compute-intensive applications.
However, the drawbacks listed above mean that adaptive ticks should not
(yet) be enabled by default.

RCU IMPLICATIONS

There are situations in which idle CPUs cannot be permitted to
enter either dyntick-idle mode or adaptive-tick mode, the most
common being when that CPU has RCU callbacks pending.

The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
to enter dyntick-idle mode or adaptive-tick mode anyway.  In this case,
a timer will awaken these CPUs every four jiffies in order to ensure
that the RCU callbacks are processed in a timely fashion.

Another approach is to offload RCU callback processing to "rcuo" kthreads
using the CONFIG_RCU_NOCB_CPU=y Kconfig option.  The specific CPUs to
offload may be selected via several methods:

1.  One of three mutually exclusive Kconfig options specify a
    build-time default for the CPUs to offload:

    a.  The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in
        no CPUs being offloaded.

    b.  The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes
        CPU 0 to be offloaded.

    c.  The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all
        CPUs to be offloaded.  Note that the callbacks will be
        offloaded to "rcuo" kthreads, and that those kthreads
        will in fact run on some CPU.  However, this approach
        gives fine-grained control on exactly which CPUs the
        callbacks run on, along with their scheduling priority
        (including the default of SCHED_OTHER), and it further
        allows this control to be varied dynamically at runtime.

2.  The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated
    list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1,
    3, 4, and 5.  The specified CPUs will be offloaded in addition to
    any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or
    CONFIG_RCU_NOCB_CPU_ALL=y.  This means that the "rcu_nocbs=" boot
    parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y.

The offloaded CPUs will never queue RCU callbacks, and therefore RCU
never prevents offloaded CPUs from entering either dyntick-idle mode
or adaptive-tick mode.  That said, note that it is up to userspace to
pin the "rcuo" kthreads to specific CPUs if desired.  Otherwise, the
scheduler will decide where to run them, which might or might not be
where you want them to run.

TESTING

So you enable all the OS-jitter features described in this document,
but do not see any change in your workload's behavior.  Is this because
your workload isn't affected that much by OS jitter, or is it because
something else is in the way?  This section helps answer this question
by providing a simple OS-jitter test suite, which is available on branch
master of the following git archive:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git

Clone this archive and follow the instructions in the README file.
This test procedure will produce a trace that will allow you to evaluate
whether or not you have succeeded in removing OS jitter from your system.
If this trace shows that you have removed OS jitter as much as is
possible, then you can conclude that your workload is not all that
sensitive to OS jitter.

Note: this test requires that your system have at least two CPUs.
We do not currently have a good way to remove OS jitter from single-CPU
systems.

KNOWN ISSUES

o   Dyntick-idle slows transitions to and from idle slightly.
    In practice, this has not been a problem except for the most
    aggressive real-time workloads, which have the option of disabling
    dyntick-idle mode, an option that most of them take.  However,
    some workloads will no doubt want to use adaptive ticks to
    eliminate scheduling-clock interrupt latencies.  Here are some
    options for these workloads:

    a.  Use PMQOS from userspace to inform the kernel of your
        latency requirements (preferred).

    b.  On x86 systems, use the "idle=mwait" boot parameter.

    c.  On x86 systems, use the "intel_idle.max_cstate=" to limit
    `   the maximum C-state depth.

    d.  On x86 systems, use the "idle=poll" boot parameter.
        However, please note that use of this parameter can cause
        your CPU to overheat, which may cause thermal throttling
        to degrade your latencies -- and that this degradation can
        be even worse than that of dyntick-idle.  Furthermore,
        this parameter effectively disables Turbo Mode on Intel
        CPUs, which can significantly reduce maximum performance.

o   Adaptive-ticks slows user/kernel transitions slightly.
    This is not expected to be a problem for computationally intensive
    workloads, which have few such transitions.  Careful benchmarking
    will be required to determine whether or not other workloads
    are significantly affected by this effect.

o   Adaptive-ticks does not do anything unless there is only one
    runnable task for a given CPU, even though there are a number
    of other situations where the scheduling-clock tick is not
    needed.  To give but one example, consider a CPU that has one
    runnable high-priority SCHED_FIFO task and an arbitrary number
    of low-priority SCHED_OTHER tasks.  In this case, the CPU is
    required to run the SCHED_FIFO task until it either blocks or
    some other higher-priority task awakens on (or is assigned to)
    this CPU, so there is no point in sending a scheduling-clock
    interrupt to this CPU.  However, the current implementation
    nevertheless sends scheduling-clock interrupts to CPUs having a
    single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
    tasks, even though these interrupts are unnecessary.

    And even when there are multiple runnable tasks on a given CPU,
    there is little point in interrupting that CPU until the current
    running task's timeslice expires, which is almost always way
    longer than the time of the next scheduling-clock interrupt.

    Better handling of these sorts of situations is future work.

o   A reboot is required to reconfigure both adaptive idle and RCU
    callback offloading.  Runtime reconfiguration could be provided
    if needed, however, due to the complexity of reconfiguring RCU at
    runtime, there would need to be an earthshakingly good reason.
    Especially given that you have the straightforward option of
    simply offloading RCU callbacks from all CPUs and pinning them
    where you want them whenever you want them pinned.

o   Additional configuration is required to deal with other sources
    of OS jitter, including interrupts and system-utility tasks
    and processes.  This configuration normally involves binding
    interrupts and tasks to particular CPUs.

o   Some sources of OS jitter can currently be eliminated only by
    constraining the workload.  For example, the only way to eliminate
    OS jitter due to global TLB shootdowns is to avoid the unmapping
    operations (such as kernel module unload operations) that
    result in these shootdowns.  For another example, page faults
    and TLB misses can be reduced (and in some cases eliminated) by
    using huge pages and by constraining the amount of memory used
    by the application.  Pre-faulting the working set can also be
    helpful, especially when combined with the mlock() and mlockall()
    system calls.

o   Unless all CPUs are idle, at least one CPU must keep the
    scheduling-clock interrupt going in order to support accurate
    timekeeping.

o   If there might potentially be some adaptive-ticks CPUs, there
    will be at least one CPU keeping the scheduling-clock interrupt
    going, even if all CPUs are otherwise idle.

    Better handling of this situation is ongoing work.

o   Some process-handling operations still require the occasional
    scheduling-clock tick.  These operations include calculating CPU
    load, maintaining sched average, computing CFS entity vruntime,
    computing avenrun, and carrying out load balancing.  They are
    currently accommodated by scheduling-clock tick every second
    or so.  On-going work will eliminate the need even for these
    infrequent scheduling-clock ticks.
wutiejun commented 7 years ago

NO_HZ:减少调度时钟滴答

本文档介绍了Kconfig选项和引导参数 减少调度时钟中断的数量,从而提高能量 效率和降低操作系统抖动。 降低操作系统抖动是非常重要的 某些类型的计算密集型高性能计算(HPC) 应用程序和实时应用程序。

管理调度时钟中断有三种主要方式 (也称为“调度时钟滴答”或简称“ticks”):

1.不要忽略调度时钟滴答(CONFIG_HZ_PERIODIC = y或 对于旧的内核,CONFIG_NO_HZ = n)。 你通常会 - 不 - 想选择这个选项。

2.忽略空闲CPU上的调度时钟滴答(CONFIG_NO_HZ_IDLE = y或 CONFIG_NO_HZ = y用于较老的内核)。 这是最常见的 方法,应该是默认的。

3.省略CPU或空闲的调度时钟 只有一个可运行的任务(CONFIG_NO_HZ_FULL = y)。 除非你 正在运行实时应用程序或某些类型的HPC 工作负载通常不需要此选项。

这三种情况在以下三节中进行了说明 第四节是关于区域协调单位具体考虑的第三部分 讨论测试,以及列出已知问题的第五部分和最后一节。

wutiejun commented 7 years ago

从来没有省略调度时钟

十九世纪九十年代初至二十世纪初的Linux的旧版本 无法省略计划时钟滴答。 事实证明 有些情况下,这种老式的办法仍然是 正确的方法,例如,在繁重的工作负载很多任务 使用短脉冲的CPU,其中非常频繁的空闲 时间段,但是这些空闲时间也很短(几十或 几百微秒)。 对于这些类型的工作负载,调度 时钟中断通常会以任何方式传送,因为在那里 每个CPU经常会有多个可运行的任务。 在这些情况下, 尝试关闭调度时钟中断将不起作用 而不是增加切换到空闲和从空闲的开销 在用户和内核执行之间进行转换。

这种操作模式可以使用CONFIG_HZ_PERIODIC = y(或 对于旧的内核,CONFIG_NO_HZ = n)。

但是,如果您改为长时间闲置的轻型工作负载 时间段,不能省略调度时钟中断将导致 功耗过大 这在电池供电方面特别糟糕 器件,导致极短的电池寿命。 如果你 正在运行轻量级的工作负载,因此您应该阅读以下内容 部分。

另外,如果您正在运行实时工作负载或HPC 工作负载短暂迭代,调度时钟中断可以 降低应用程序性能。 如果这描述你的工作量, 你应该阅读以下两节。

wutiejun commented 7 years ago

为空闲CPU省略调度时钟滴答

如果CPU空闲,则发送调度时钟几乎没有意义 打断。 毕竟,调度时钟中断的主要目的 是迫使一个忙碌的CPU把注意力转移到多个职责之中, 而空闲的CPU没有任何职责将其注意力转移到其中。

CONFIG_NO_HZ_IDLE = y Kconfig选项使内核避免发送 调度时钟中断到空闲CPU,这是至关重要的 无论是电池供电设备还是高度虚拟化的大型机。 运行CONFIG_HZ_PERIODIC = y内核的电池供电设备 很快耗尽电池,容易2-3倍的速度 相同的设备运行CONFIG_NO_HZ_IDLE = y内核。 大型机运行 1,500个操作系统实例可能会发现其CPU时间的一半被消耗 不必要的调度时钟中断。 在这些情况下,那里 是避免发送调度时钟中断的强烈动机 空闲CPU。 也就是说,dyntick-idle模式不是免费的:

1.它增加了在路径上执行的指令数量 到和从空闲循环。

2.在许多架构上,dyntick-idle模式也增加了 数量昂贵的时钟重编程操作。

因此,经常具有积极的实时响应约束的系统 运行CONFIG_HZ_PERIODIC = y内核(或旧的内核使用CONFIG_NO_HZ = n) 以避免从空闲转移延迟退化。

据说没有接收到调度时钟中断的空闲CPU “dyntick-idle”,“dyntick-idle mode”,“nohz mode”或“running 本文档的其余部分将使用“dyntick-idle模式”。

还有一个可以用于禁用的引导参数“nohz =” 通过指定“nohz = off”,CONFIG_NO_HZ_IDLE = y内核中的dyntick-idle模式。 默认情况下,CONFIG_NO_HZ_IDLE = y内核使用“nohz = on”启动,启用 dyntick-idle模式。

wutiejun commented 7 years ago

省略只有一个可运行任务的CPU调度时钟滴答

如果一个CPU只有一个可运行的任务,发送它就没什么意义了 一个调度时钟中断,因为没有其他任务切换到。 请注意,省略只有一个可运行的CPU的调度时钟滴答 任务意味着也省略了空闲的CPU。

ONFIG_NO_HZ_FULL = y Kconfig选项使内核避免 通过单个可运行的任务向CPU发送调度时钟中断, 并且这样的CPU被称为“自适应蜱CPU”。这个很重要 对于具有激进的实时响应约束的应用程序,因为 它允许他们最大限度地改善最坏情况的响应时间 调度时钟中断的持续时间。这也很重要 计算密集型短周期工作负载:如果有任何CPU 在给定的迭代期间延迟,所有其他CPU将被强制执行 延迟CPU完成时等待空闲。因此,延迟被乘以 比CPU少一个。在这些情况下,有 再次强烈的动机是避免发送调度时钟中断。

默认情况下,没有CPU将是自适应刻度CPU。 “nohz_full =” boot参数指定自适应ticks CPU。例如, “nohz_full = 1,6-8”表示CPU 1,6,7和8是自适应蜱 的CPU。请注意,您禁止将所有CPU标记为 自适应校验CPU:至少有一个非自适应校验CPU必须保留 在线处理计时任务,以确保系统 像gettimeofday()这样的调用在自适应刻度CPU上返回准确的值。 (这不是CONFIG_NO_HZ_IDLE = y的问题,因为没有运行 用户进程观察时钟频率的轻微漂移) 启动CPU被禁止进入自适应蜱模式。指定一个 包含引导CPU的“nohz_full =”掩码将导致启动时间 错误消息,引导CPU将从掩码中删除。注意 这意味着您的系统必须至少有两个CPU CONFIG_NO_HZ_FULL = y为您做任何事情。

或者,CONFIG_NO_HZ_FULL_ALL = y Kconfig参数指定 除了引导CPU之外的所有CPU都是自适应校验CPU。这个 Kconfig参数将被“nohz_full =”引导参数覆盖, 所以如果CONFIG_NO_HZ_FULL_ALL = y Kconfig参数和 指定了“nohz_full = 1”引导参数,引导参数将会启动 只有CPU 1将是一个自适应ticks CPU。

最后,自适应剔除CPU必须卸载其RCU回调。 这在下面的“RCU含义”部分中有所介绍。

通常,CPU尽可能长时间保持自适应模式。 特别是转换到内核模式不会自动更改 的模式。相反,只有在需要的情况下,CPU才会退出自适应模式, 例如,如果该CPU排队RCU回调。

就像使用dyntick-idle模式一样,自适应嘀嗒模式的好处是 不来免费:

  1. CONFIG_NO_HZ_FULL选择CONFIG_NO_HZ_COMMON,因此无法运行 自适应嘀嘀嘀嘀咕。。。。。这种依赖 延伸到实施中,使所有的成本 CONFIG_NO_HZ_IDLE也由CONFIG_NO_HZ_FULL引起。

2.由于用户/内核转换稍微更昂贵 需要通知内核子系统(如RCU) 模式的变化。

  1. POSIX CPU定时器可防止CPU进入自适应模式。 实时应用程序需要根据CPU时间采取行动 消费需要使用其他方式这样做。

4.如果有更多的perf事件挂起比硬件可以 容纳,通常是循环收集 所有这些都随着时间的推移。自适应模式可能会阻止这种情况 轮回发生。这可能会被修正 防止CPU等待大量的perf事件 进入自适应模式。

可以计算自适应刻度CPU的调度器统计信息 略微不同于非自适应刻度的CPU。 这可能反过来会扰乱实时任务的负载平衡。

LB_BIAS调度程序功能通过自适应跳数禁用。

虽然随着时间的推移有所改进,但是自适应蜱是相当的 可用于许多类型的实时和计算密集型应用程序。 然而,上面列出的缺点意味着自适应蜱不应该 (还)默认启用。

wutiejun commented 7 years ago

RCU的含义

有些情况下无法使用空闲CPU 最多输入dyntick-idle模式或自适应tick模式 通常是当该CPU具有挂起的RCU回调时。

可以使用CONFIG_RCU_FAST_NO_HZ = y Kconfig选项来引起这种CPU 无论如何,进入dyntick-idle模式或自适应tick模式。在这种情况下, 一个定时器将每四个jiffies唤醒这些CPU,以确保 RCU回调被及时处理。

另一种方法是将RCU回调处理卸载到“rcuo”kthreads 使用CONFIG_RCU_NOCB_CPU = y Kconfig选项。具体的CPU到 卸载可以通过以下几种方式进行选择:

三个互斥的Kconfig选项之一指定一个 构建时间默认为CPU卸载:

一个。 CONFIG_RCU_NOCB_CPU_NONE = y Kconfig选项生成 没有CPU被卸载。

湾CONFIG_RCU_NOCB_CPU_ZERO = y Kconfig选项导致 CPU 0被卸载。

C。 CONFIG_RCU_NOCB_CPU_ALL = y Kconfig选项导致所有 CPU被卸载。请注意,回调将是 卸载到“rcuo”kthreads,那些那些kthreads 实际上会运行在某些CPU上。但是,这种方法 对精确的CPU进行了细粒度的控制 回调运行,以及其调度优先级 (包括SCHED_OTHER的默认值),并进一步 允许这种控制在运行时动态变化。

2.“rcu_nocbs =”内核引导参数,以逗号分隔 CPU和CPU范围的列表,例如“1,3-5”选择CPU 1, 3,4和5.指定的CPU将被卸载 任何指定为由CONFIG_RCU_NOCB_CPU_ZERO = y卸载的CPU CONFIG_RCU_NOCB_CPU_ALL = Y。这意味着“rcu_nocbs =”启动 参数对于使用RCU_NOCB_CPU_ALL = y构建的内核没有影响。

卸载的CPU将不会排队RCU回调,因此RCU 从不防止卸载的CPU进入dyntick-idle模式 或自适应刻度模式。也就是说,请注意,由用户空间决定 如果需要,将“rcuo”kthreads引导到特定的CPU。否则, 调度程序将决定运行它们的位置,这可能是也可能不是 你希望他们跑的地方

wutiejun commented 7 years ago

测试

因此,您可以启用本文档中描述的所有OS-jitter功能, 但是您的工作负载的行为看不到任何改变。这是因为吗 您的工作负载不受OS抖动影响,或者是因为 还有其他的事情呢?本节有助于回答这个问题 通过提供一个简单的OS-jitter测试套件,可以在分支机构上使用 掌握以下git存档:

混帐://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git

克隆此存档并按照README文件中的说明进行操作。 此测试程序将产生一个可以让您进行评估的痕迹 无论您是否成功从系统中删除操作系统抖动。 如果此跟踪显示您已经删除OS抖动 可能的话,你可以得出结论,你的工作量不是全部 对OS抖动敏感。

注意:此测试要求您的系统至少有两个CPU。 我们目前没有一种从单CPU消除操作系统抖动的好办法 系统。

wutiejun commented 7 years ago

已知的问题

o Dyntick-idle使转换过渡到空闲状态。 在实践中,除了最多,这还没有成为问题 积极的实时工作负载,可以选择禁用 dyntick-idle模式,它们大多数都采用的选项。然而, 一些工作量无疑将要使用自适应蜱 消除调度时钟中断延迟。这里有一些 这些工作负载的选项:

一个。使用用户空间中的PMQOS通知内核 延迟要求(首选)。

湾在x86系统上,使用“idle = mwait”引导参数。

C。在x86系统上,使用“intel_idle.max_cstate =”来限制 `最大C状态深度。

天。在x86系统上,使用“idle = poll”引导参数。 但是请注意,使用此参数可能会导致 你的CPU过热,这可能会导致热调节 降低延迟 - 而且这种退化可以 甚至比愚蠢的更糟糕。此外, 此参数有效地禁用了英特尔的Turbo模式 CPU可以显着降低最大性能。

o Adaptive-ticks会稍微减慢用户/内核转换。 这不会是计算密集型的问题 工作量很少,这些过渡很少。仔细的基准测试 将需要确定是否有其他工作负载 受此影响显着影响。

o自适应蜱不做任何事情,除非只有一个 给定CPU的可运行任务,即使有一个数字 在其他情况下,调度时钟tick不是 需要。给出一个例子,考虑一个有一个的CPU 可运行的高优先级SCHED_FIFO任务和任意数 的低优先级SCHED_OTHER任务。在这种情况下,CPU是 需要运行SCHED_FIFO任务,直到它阻塞或 一些其他较高优先级的任务唤醒(或分配给) 这个CPU,所以发送调度时钟没有意义 中断到这个CPU。但是,目前的实现 然而将调度时钟中断发送到具有a的CPU 单次运行SCHED_FIFO任务和多个可运行SCHED_OTHER 即使这些中断是不必要的。

即使在给定的CPU上有多个可运行的任务, 中断CPU直到目前为止都没有什么意义 运行任务的时间片过期,几乎总是这样 比下一个调度时钟中断的时间长。

更好地处理这些情况是未来的工作。

o需要重新启动才能重新配置自适应空闲和RCU 回调卸载可以提供运行时重新配置 然而,如果需要,由于重新配置RCU的复杂性 运行时,需要一个非常好的理由。 特别是因为你有直接的选择 只需从所有CPU卸载RCU回调并固定它们 当你想要他们被固定在哪里。

o需要额外的配置来处理其他来源 OS抖动,包括中断和系统实用工作 和流程。此配置通常涉及绑定 特定CPU的中断和任务。

o某些OS抖动来源目前只能被消除 限制工作量。例如,消除的唯一方法 由于全局TLB下降引起的操作系统抖动是为了避免重新映射 操作(如内核模块卸载操作)那样 导致这些击倒。另一个例子,页面错误 和TLB错失可以减少(在某些情况下被消除) 使用庞大的页面和限制使用的内存量 通过应用程序。预设工作组也可以 有用的,特别是当结合mlock()和mlockall() 系统调用

o除非所有CPU都处于空闲状态,否则至少有一个CPU必须保留 调度时钟中断为了支持准确 报时。

o如果可能有一些自适应蜱的CPU,那么 将至少有一个CPU保持调度时钟中断 即使所有CPU都处于空闲状态。

更好地处理这种情况正在进行中。

o某些流程处理操作仍然偶尔需要 调度时钟滴答。这些操作包括计算CPU 加载,维护平均,计算CFS实体vruntime, 计算avenrun,并执行负载平衡。他们是 目前通过调度时钟每秒钟调整 或者。正在进行的工作将消除对这些的需求 不频繁的调度 - 时钟滴答。

wutiejun commented 6 years ago

https://stackoverflow.com/questions/19719911/getting-user-space-stack-information-from-perf