tud-zih-energy / lo2s

Linux OTF2 Sampling - A Lightweight Node-Level Performance Monitoring Tool
https://tu-dresden.de/zih/forschung/projekte/lo2s?set_language=en
GNU General Public License v3.0
44 stars 13 forks source link

Optimize --list-events #312

Closed cvonelm closed 1 month ago

cvonelm commented 7 months ago

On a recent Intel systems there can be over 890 PMU events, making --list-events take one and a half minutes to load, which is unacceptable, really.

There should hopefully be some way to get it go fast.

cvonelm commented 5 months ago

Apparently, this is caused by the kernel. Everytime perf_event_open is called (which is a lot for --list-events as we check if we can perf_event_open every PMU event) the kernel has to reallocate the memory for asynchronous event recording using PEBS. Due to what looks like a livelock/deadlock siituation, this memory allocation can take almost a second, which for hundreds of events with hundreds of perf_event_open's adds up quite a lot.

I personally think we can do away with the perf_event_open()-ing checks, as I think they are overtly paranoid:

  1. Whether events are only openable per-process only depends on the value of perf_event_paranoid. If it is greater than 0, then only per-process measurements are allowed.
  2. Which CPUs an event can be opened on, can be read from the cpus or cpumask files in /sys/bus/event_source

This should of course be tested by comparing what cpus, cpumask and perf_event_paranoid report for event openability and where the events can actually be opened.

cvonelm commented 1 month ago

I can not recreate this on Intel Xeon Max 9468 or Core i9-12900K anymore, so I presume that the underlying problem for the perf_event_open slowness has been fixed in subsequent kernel releases. Closing the issue for now.