pyamsoft / pstate-frequency

Easily control Intel p-state driver on Linux
https://pyamsoft.blogspot.com/
GNU General Public License v2.0
172 stars 19 forks source link

Why is Turbo Boost disabled in the performance power plan? #21

Closed ChristophHaag closed 8 years ago

ChristophHaag commented 8 years ago

I have this CPU: http://ark.intel.com/products/71670/Intel-Core-i7-3632QM-Processor-6M-Cache-up-to-3_20-GHz-BGA

Processor Base Frequency 2.2 GHz Max Turbo Frequency 3.2 GHz

The difference between the non-turbo and turbo frequency makes a huge difference.

The question may be a bit unrelated, but if I am to use the max-performance power plan instead, does it work such that is the best plan possible? With that I mean that i7z says:

Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 32x/31x/29x/29x

If all cores are set to the highest turbo boost frequency, do I still get the largest multiplier and thus the best single core performance on at least one core?

ChristophHaag commented 8 years ago

To answer my own question: When I use the max-performance plan and start stress -c 8, then all cores use the 29x multiplier, but when I start stress -c 1, one core will go over 30x, while the others are slightly over 29x.

pyamsoft commented 8 years ago

A two part answer.

Turbo boost is disabled in the performance plan because it is my personal preference. You are welcome to change the default by modifying the PRESET_POWER_PLAN_PERFORMANCE flag in the config.mk file. As for why it is the default, it is for no other reason than that I write the software and thus, I configure it by default to my liking as reasonably as possible.

For the question relating to Turbo Boost, you must first understand that pstate-frequency is not a magic snake oil to suddenly crank more performance out of your system. It is just a (arguably) simpler way to interface with the sysfs and intel_pstate configuration files. Even using the max-performance plan will only merit you the maximum available performance that the intel_pstate driver is willing to put out. See the issues/questions section in the README for more explanation.

Essentially it boils down to this: If your max turbo multiplier is as you showed above, that means the following: If a single core is in use, it will have a max multiplier of 32x while all others hang somewhere around 29x. With 2 cores, they will both have 31x while others hang around 29x. You can read more about Intel's Turbo Boost technology if you are curious, but know that the handling of things like turbo multipliers are out of the scope of pstate-frequency.

ChristophHaag commented 8 years ago

I understand that it won't give me magical extra performance, but the thing is that the CPU will run at 2.2 GHz (even though it says pstate::CPU_MAX -> 100% [3200000KHz]). 2.2 GHz is the max frequency without turbo boost. Maybe that's not much of a difference when it comes to desktop CPUs that clock at 4 GHz without and 4.2 GHz with turbo boost, but as I said: there are CPUs where the difference is quite big.

nbench is not very modern, but it shows the basic difference of 35-40% more real world performance with turbo core.

Of course if you like it that way, it's your software. I just wanted to give my opinion that people probably expect turbo boost to be enabled on a mode that is named "performance".

$ sudo pstate-frequency -S -p performance
pstate-frequency 2.0.2[clang]
    pstate::CPU_DRIVER   -> intel_pstate
    pstate::CPU_GOVERNOR -> powersave
    pstate::TURBO        -> 1 [OFF]
    pstate::CPU_MIN      -> 37% [1200000KHz]
    pstate::CPU_MAX      -> 100% [3200000KHz]
$ nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1000.4  :      25.66  :       8.43
STRING SORT         :          592.18  :     264.60  :      40.96
BITFIELD            :      4.1586e+08  :      71.33  :      14.90
FP EMULATION        :          249.71  :     119.82  :      27.65
FOURIER             :           28561  :      32.48  :      18.24
ASSIGNMENT          :          38.848  :     147.82  :      38.34
IDEA                :          7681.6  :     117.49  :      34.88
HUFFMAN             :          3971.3  :     110.13  :      35.17
NEURAL NET          :          54.976  :      88.32  :      37.15
LU DECOMPOSITION    :          1800.7  :      93.29  :      67.36
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 101.498
FLOATING-POINT INDEX: 64.439
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 8 CPU GenuineIntel Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz 2200MHz
L2 Cache            : 6144 KB
OS                  : Linux 4.4.0-rc7-mainline
C compiler          : 
libc                : 
MEMORY INDEX        : 28.602
INTEGER INDEX       : 23.121
FLOATING-POINT INDEX: 35.740
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
nbench  269.92s user 0.09s system 99% cpu 4:30.03 total
$ sudo pstate-frequency -S -p max-performance
pstate-frequency 2.0.2[clang]
    pstate::CPU_DRIVER   -> intel_pstate
    pstate::CPU_GOVERNOR -> performance
    pstate::TURBO        -> 0 [ON]
    pstate::CPU_MIN      -> 99% [3168000KHz]
    pstate::CPU_MAX      -> 100% [3200000KHz]
$ nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1422.9  :      36.49  :      11.98
STRING SORT         :          846.81  :     378.38  :      58.57
BITFIELD            :      5.6828e+08  :      97.48  :      20.36
FP EMULATION        :          338.74  :     162.54  :      37.51
FOURIER             :           39835  :      45.30  :      25.45
ASSIGNMENT          :          54.117  :     205.93  :      53.41
IDEA                :           10401  :     159.09  :      47.23
HUFFMAN             :          5410.9  :     150.04  :      47.91
NEURAL NET          :          75.261  :     120.90  :      50.86
LU DECOMPOSITION    :          2509.4  :     130.00  :      93.87
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 140.399
FLOATING-POINT INDEX: 89.293
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 8 CPU GenuineIntel Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz 2900MHz
L2 Cache            : 6144 KB
OS                  : Linux 4.4.0-rc7-mainline
C compiler          :
libc                :
MEMORY INDEX        : 39.936
INTEGER INDEX       : 31.758
FLOATING-POINT INDEX: 49.526
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
nbench  264.16s user 0.08s system 99% cpu 4:24.25 total
pyamsoft commented 8 years ago

In the latest dev branch I changed the name of the 'performance' plan to 'balanced' to hopefully convey that it is not an all out performance trade off. Similarly, the 'max-performance' plan has been changed to simply 'performance'. Hopefully this can at least make the default settings a bit less confusing.