intel / pcm

Intel® Performance Counter Monitor (Intel® PCM)
BSD 3-Clause "New" or "Revised" License
2.82k stars 476 forks source link

pcm problem: Can not access server uncore PCI configuration space #781

Closed Ahnyeohn closed 4 months ago

Ahnyeohn commented 4 months ago

Hello, I'm having a problem running pcm-memory when i using -DPCM_USE_PCI_MM_LINUX compile option.

This is my server information(Not VM, just physical server):

Hardware

Intel(R) Xeon(R) W-2223 CPU

OS&Kernel

Ubuntu 18.04 with kernel 5.2

Terminal:

When I run the sudo ./pcm-memory command, I get the following message:

 Intel(r) Performance Counter Monitor: Memory Bandwidth Monitoring Utility ($Format:%ci ID=%h$)

 This utility measures memory bandwidth per channel or per DIMM rank in real-time

=====  Processor information  =====
Linux arch_perfmon flag  : yes
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 22
CPU model number         : 85
Number of physical cores: 4
Number of logical cores: 8
Number of online logical cores: 8
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 4
Last level cache slices per socket: 6
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3600000000 Hz
IBRS enabled in the kernel   : yes
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS                     : yes
Package thermal spec power: 120 Watt; Package minimum power: 16 Watt; Package maximum power: 300 Watt;

INFO: Linux perf interface to program uncore PMUs is present
mmap failed: errno is 22
Can not access server uncore PCI configuration space. Access to uncore counters (memory and QPI bandwidth) is disabled.
You must be root to access server uncore counters in PCM.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 6 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs

Detected Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz "Intel(r) microarchitecture codename Cascade Lake-SP" stepping 7 microcode level 0x5003303
Access to Intel(r) Performance Counter Monitor has denied (no MSR or PCI CFG space access).
Cleaning up
 Closed perf event handles
 Zeroed uncore PMU registers
 Freeing up all RMIDs

Why am I getting the following error even though I am already running as root? Since I can open the /dev/mem file normally(there is no error in open("/dev/mem", O_RDWR) function), it doesn't seem to be a sudo permissions issue. Also, I tried booted with iomem=relaxed option described in doc already.

rdementi commented 4 months ago

There could be some Linux security mechanism which forbids accessing certain mmio ranges directly. Could you avoid using the PCM_USE_PCI_MM_LINUX option?

Ahnyeohn commented 4 months ago

It works fine when I don't use the PCM_USE_PCI_MM_LINUX option, but the program I want to test compiles pcm with that option. Is there any way to work around the problem while using the PCM_USE_PCI_MM_LINUX option?

rdementi commented 4 months ago

Could you try setting this environment variable:

export PCM_USE_UNCORE_PERF=1

pcm-memory

Let me know if it works/share the output

Ahnyeohn commented 4 months ago

It still doesn't work.

Output:

 Intel(r) Performance Counter Monitor: Memory Bandwidth Monitoring Utility ($Format:%ci ID=%h$)

 This utility measures memory bandwidth per channel or per DIMM rank in real-time

=====  Processor information  =====
Linux arch_perfmon flag  : yes
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 22
CPU model number         : 85
Number of physical cores: 4
Number of logical cores: 8
Number of online logical cores: 8
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 4
Last level cache slices per socket: 6
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3600000000 Hz
IBRS enabled in the kernel   : yes
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS                     : yes
Package thermal spec power: 120 Watt; Package minimum power: 16 Watt; Package maximum power: 300 Watt;

INFO: Linux perf interface to program uncore PMUs is present
mmap failed: errno is 22
Can not access server uncore PCI configuration space. Access to uncore counters (memory and QPI bandwidth) is disabled.
You must be root to access server uncore counters in PCM.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 6 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs

Detected Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz "Intel(r) microarchitecture codename Cascade Lake-SP" stepping 7 microcode level 0x5003303
Access to Intel(r) Performance Counter Monitor has denied (no MSR or PCI CFG space access).
Cleaning up
 Closed perf event handles
 Zeroed uncore PMU registers
 Freeing up all RMIDs
rdementi commented 4 months ago

Your output suggests that the variable PCM_USE_UNCORE_PERF is not set. PCM prints "_INFO: using Linux perf interface to program uncore PMUs because env variable PCM_USE_UNCOREPERF=1" if the variable is set: https://github.com/intel/pcm/blob/dd672b2bc76ae630fd7e9317af829cb1622c57a5/src/cpucounters.cpp#L7210

Ahnyeohn commented 4 months ago

The error was caused by sudo not passing environment variables properly. I fixed the problem, thank you very much, but could you please tell me if you know why I get the error when I don't use the PCM_USE_UNCORE_PERF option?

rdementi commented 4 months ago

it seems your kernel is blocking direct MMIO access to some of the PMU registers. Accessing through drivers (pcicfg or perf_events) worked for you.

Ahnyeohn commented 4 months ago

I understand, thank you.