intel / PerfSpect

System performance characterization tool based on linux perf
https://intel.github.io/PerfSpect/
BSD 3-Clause "New" or "Revised" License
333 stars 30 forks source link

If Cstate C1 is disabled, will the final result be affected? #28

Closed jasonneverstop closed 1 year ago

jasonneverstop commented 1 year ago

If Cstate C1 is disabled, will the final result be affected? Which metric are not affected?

hilldani commented 1 year ago

if you disable c-states completely then you'll see higher metric_CPU operating frequency (in GHz) since it wont go into sleep states. It will likely impact other metrics as well since it wont waste time coming out of lower c-states. Really depends on the workload (spikey/stable, stressing the system/not stressing the system, etc.)

jasonneverstop commented 1 year ago

affected I found that when C1 is disabled, the value of 'CPU utilization' will be 100%, which is definitely abnormal. Would you mind telling me if C1 is disabled, it will also make other metric abnormal? Moreover,what are those metics? I am using version 1.12

hilldani commented 1 year ago

I am not sure I follow your question but I recommend using the latest release of perfspect. In version 1.2.9 we included a bug fix that improves the accuracy of some metrics (like cpu-utilization) which started to give error at high cpu utilization due to multiplexing overhead. However 1.2.9 fixes this by normalizing the time intervals https://github.com/intel/PerfSpect/releases/tag/1.2.9 Let me know if this helps

jasonneverstop commented 1 year ago

I am not sure I follow your question but I recommend using the latest release of perfspect. In version 1.2.9 we included a bug fix that improves the accuracy of some metrics (like cpu-utilization) which started to give error at high cpu utilization due to multiplexing overhead. However 1.2.9 fixes this by normalizing the time intervals https://github.com/intel/PerfSpect/releases/tag/1.2.9 Let me know if this helps

Thank you for your prompt response. I will use the latest version if there is an opportunity. Thank you again very much.

jasonneverstop commented 1 year ago

included

I conducted a test using version 1.2.10, and even after disabling the C1 state using the 'cpupower idle-set -d 1' command, the 'CPU utilization' result was still incorrect and close to 100%. Could you please help me identify the reasons behind this issue and provide possible solutions? Thank you very much.

hilldani commented 1 year ago

I ran the following tests to try to reproduce your issue but could not: Icelake bare-metal system running Ubuntu 22.04

sudo ./perf-collect -a "stress-ng --cpu 0 --cpu-load 70 --timeout 10s"
./perf-postprocess # gave 68% cpu utilization

sudo cpupower idle-set -d 1

sudo ./perf-collect -a "stress-ng --cpu 0 --cpu-load 70 --timeout 10s"
./perf-postprocess # gave 69% cpu utilization

could you provide reproducible steps for your issue?

jasonneverstop commented 1 year ago

could you provide reproducible steps for your issue?

Okay, thank you for your reply. Skylake bare-metal system running centos 7.2 All cstates need to be disabled. The steps that can be replicated are as follows:

[root@jd perfspect-1.2.9]# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu

Analyzing CPU 0:
Number of idle states: 4
Available idle states: POLL C1-SKX C1E-SKX C6-SKX
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 696753805
Duration: 399377260105
C1-SKX:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 327278273
Duration: 27340952024
C1E-SKX:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 1369947359
Duration: 371900137394
C6-SKX:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 251876414
Duration: 6801607018565

./perf-collect -t 10
./perf-postprocess -r perfstat.csv  # gave 3.05% cpu utilization

cpupower idle-set -d 1
cpupower idle-set -d 2
cpupower idle-set -d 3

[root@jd perfspect-1.2.9]# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu

Analyzing CPU 0:
Number of idle states: 4
Available idle states: POLL C1-SKX C1E-SKX C6-SKX
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 696800925
Duration: 399395641860
C1-SKX (DISABLED) :
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 327352507
Duration: 27344618309
C1E-SKX (DISABLED) :
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 1370478750
Duration: 371952973917
C6-SKX (DISABLED) :
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 251954842
Duration: 6801872688711

./perf-collect -t 10
./perf-postprocess -r perfstat.csv  # gave 99.67% cpu utilization
hilldani commented 1 year ago

All cstates need to be disabled.

Thanks for the clarification. Yes if all cstates are disabled then cpu utilization will be 100% in perfspect because the equation for cpu utilization (link) is:

100 * [ref-cycles] / [TSC]

Think of TSC as the full potential of a system. Total cycles per second across all threads cores and sockets. ref-cycles is the cycles when a cpu is in a non halted state (i.e. C0). If all lower c-states are disabled then it will always run at C0 and ref-cycles will be almost identical to TSC (two different methods of calculation arriving at around the same number). Any other equations which contain ref-cycles will also experience this. I am curious why you would need to disable all c-states though. Often just disabling the lower c-states and p-states will be sufficient to improve performance and reduce additional latency added by coming out of lower states.

jasonneverstop commented 1 year ago

All cstates need to be disabled.

Thanks for the clarification. Yes if all cstates are disabled then cpu utilization will be 100% in perfspect because the equation for cpu utilization (link) is:

100 * [ref-cycles] / [TSC]

Think of TSC as the full potential of a system. Total cycles per second across all threads cores and sockets. ref-cycles is the cycles when a cpu is in a non halted state (i.e. C0). If all lower c-states are disabled then it will always run at C0 and ref-cycles will be almost identical to TSC (two different methods of calculation arriving at around the same number). Any other equations which contain ref-cycles will also experience this. I am curious why you would need to disable all c-states though. Often just disabling the lower c-states and p-states will be sufficient to improve performance and reduce additional latency added by coming out of lower states.

Thank you for your explanation. Completely disabling cstates is currently a part of the reality, but I am trying to change it.

hilldani commented 1 year ago

@jasonneverstop if you have other server configuration questions I highly recommend this other tool we're working on https://github.com/intel/svr-info. It contains many best practice configuration recommendations (i.e. ideal frequency governor, dimm population, etc.)