Closed cvonelm closed 1 month ago
Apparently, this is caused by the kernel. Everytime perf_event_open is called (which is a lot for --list-events as we check if we can perf_event_open
every PMU event) the kernel has to reallocate the memory for asynchronous event recording using PEBS. Due to what looks like a livelock/deadlock siituation, this memory allocation can take almost a second, which for hundreds of events with hundreds of perf_event_open's adds up quite a lot.
I personally think we can do away with the perf_event_open()-ing checks, as I think they are overtly paranoid:
cpus
or cpumask
files in /sys/bus/event_source
This should of course be tested by comparing what cpus
, cpumask
and perf_event_paranoid
report for event openability and where the events can actually be opened.
I can not recreate this on Intel Xeon Max 9468 or Core i9-12900K anymore, so I presume that the underlying problem for the perf_event_open slowness has been fixed in subsequent kernel releases. Closing the issue for now.
On a recent Intel systems there can be over 890 PMU events, making --list-events take one and a half minutes to load, which is unacceptable, really.
There should hopefully be some way to get it go fast.