tud-zih-energy / lo2s

Linux OTF2 Sampling - A Lightweight Node-Level Performance Monitoring Tool
https://tu-dresden.de/zih/forschung/projekte/lo2s?set_language=en
GNU General Public License v3.0
46 stars 13 forks source link

Support Topdown measurements in lo2s like perf #213

Open s9105947 opened 2 years ago

s9105947 commented 2 years ago

Dear maintainers,

I tried to perform Top-Down Microarchitecture Analysis on an Alder Lake processor using lo2s. However, lo2s does not find all required counters.

Steps to reproduce

Local test system kallisto;

$ lscpu
[...]
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i9-12900K
    CPU family:          6
    Model:               151
[...]

Show available counters:

$ ls -lh /sys/devices/cpu_core/events/topdown*
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-bad-spec
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-be-bound
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-br-mispredict
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-fe-bound
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-fetch-lat
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-heavy-ops
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-mem-bound
-r--r--r-- 1 root root 4.0K Jul  8 13:30 /sys/devices/cpu_core/events/topdown-retiring
$ perf list | grep topdown
  topdown-bad-spec OR cpu_atom/topdown-bad-spec/     [Kernel PMU event]
  topdown-bad-spec OR cpu_core/topdown-bad-spec/     [Kernel PMU event]
  topdown-be-bound OR cpu_atom/topdown-be-bound/     [Kernel PMU event]
  topdown-be-bound OR cpu_core/topdown-be-bound/     [Kernel PMU event]
  topdown-br-mispredict OR cpu_core/topdown-br-mispredict/ [Kernel PMU event]
  topdown-fe-bound OR cpu_atom/topdown-fe-bound/     [Kernel PMU event]
  topdown-fe-bound OR cpu_core/topdown-fe-bound/     [Kernel PMU event]
  topdown-fetch-lat OR cpu_core/topdown-fetch-lat/   [Kernel PMU event]
  topdown-heavy-ops OR cpu_core/topdown-heavy-ops/   [Kernel PMU event]
  topdown-mem-bound OR cpu_core/topdown-mem-bound/   [Kernel PMU event]
  topdown-retiring OR cpu_atom/topdown-retiring/     [Kernel PMU event]
  topdown-retiring OR cpu_core/topdown-retiring/     [Kernel PMU event]

Now list all counters found by lo2s with lo2s --list-events | grep topdown

Expected Result

All counters above are displayed. (8 for cpu_core, 4 for cpu_atom)

Actual Result

Only the cpu_atom (E-Core) counters are found

$ lo2s --list-events | grep topdown
  cpu_atom/topdown-bad-spec/ *
  cpu_atom/topdown-be-bound/ *
  cpu_atom/topdown-fe-bound/ *
  cpu_atom/topdown-retiring/ *

Additional Notes

This might be caused by the hybrid architecture of Alder Lake, having "P-cores" and "E-cores". Hence, the sysfs file which typically reside in /sys/devices/cpu are split into /sys/devices/cpu_core (P-cores) and /sys/devices/cpu_atom (E-cores). Perhaps lo2s does not find all required files? (Notably other cpu_core events are found, e.g. cpu_core/slots/)

cvonelm commented 2 years ago

This issue is from now on only concerned with the how and why of topdown metrics in lo2s (with a focus on alder lake), alder lake support i.,g. has been moved to another issue.

The two possibilities to explore here are (besides not implementing at all due to cost):

cvonelm commented 2 years ago

I have looked at the example code for alder lake and the documentation at tools/perf/Documentation/topdown.txt and it is practically identical to how you can read topdown events Ice Lake.

There seem to be two kind of ways how topdown can be accessed:

Fixed events (available skylake and up)

In this access mode a normal perf event exists for every topdown metric like cpu/topdown-fetch-bubbles.

Support for this is available on every architecture from Skylake up, including Alder Lake. (However Alder Lake itself suffers from lo2s not being hybrid-aware currently)

Generic event using rdpmc (Available Ice Lake and up)

This is what the topdown code from your Alder Lake example and tools/perf/Documentation/topdown.txt does.

With Ice Lake, a new way of reading topdown metrics came to be, which utilizes the rdpmc instruction from userspace.

The documentation states, that this is faster, but on the other hand it only works on Intel CPUs Ice Lake up and requires a custom metric set-up.

cvonelm commented 2 years ago

As discovered in the PR for Alder Lake support. The individual topdown metrics are not available as simple perf events on Alder Lake.