Use "perf list" information rather than a separate perfevent.conf for the perfevent pmda

wcohen commented 6 years ago

The perfevent pmda has a configuration file to describe the events available for each processor implementation. Trying to keep this up-to-date is going to be labor intensive and likely to lag behind the kernel. Recently the perf tool has tables in it describing the specific events (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/x86) which can be listed out with "perf list". In pcp it might be good to make the perfevent pmda use these tables as a source of information to minimize this duplication in effort.

natoscott commented 6 years ago

fyi @hkshaw1990 @jpwhite4

hkshaw1990 commented 6 years ago

The perfevent pmda has a configuration file to describe the events available for each processor implementation. Trying to keep this up-to-date is going to be labor intensive and likely to lag behind the kernel.

Agreed, that's why we moved to an approach where the perfevent pmda now reads events off of the /sys/bus/event_source/devices//... directly. Basically, if the events are supported and exported by the kernel, then pcp's perfevent pmda can read and support those. In case of these dynamic events, the perfevent.conf file comes in handy to allow only a specific set of events to be enabled using the [dynamic] section. This is a workaround for a problem that we faced when enabling all the supported events across all cpus and run out of max open file descriptors allowed (by default), but I digress.

Recently the perf tool has tables in it describing the specific events (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/x86) which can be listed out with "perf list". In pcp it might be good to make the perfevent pmda use these tables as a source of information to minimize this duplication in effort.

Yes, that's one of the items in my todo list for the perfevent pmda (not getting enough time), i.e., to import the event list and modify the current perfevent's parser to parse through all the events and their info. Reading the event list off of "perf list" o/p won't probably help much, since, we need the event codes and the PMU types for the perf_event_open() syscall. Anyway, there are definitely some pros for this:

All these info won't have to duplicated.
Help text is readily available from these files, which would be a huge improvement over the current design.
Events which are supported for a system and not exported through the kernel can also be monitored using the pmda.

There are some issues as well with the proposed design. The domain of these events aren't very clear to me from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/x86/haswellx. I mean, assume that some events are per-thread (cpu), and some are per-core and in some cases, per-chip. How do we distinguish between them? With the current design, we look at the cpumask provided by the kernel driver in /sys/bus/event_source/devices//cpumask. I will have to look at the source code of perf to find out how they are actually taking care of this. @wcohen, if you have a PR handy, will be happy to test it out and give feedback.

wcohen commented 6 years ago

I missed that there was already some logic in perfevent pmda to read available perf events from commit 3224a0e9f8a508712df09afe2853a19750d19e2e . That is definitely a step in the right direction.

Limiting the list of probed events because perfevent runs out of file descriptor is a problem and it is likely to become worse as the number of available cores in machines increase. Isn't the discovery of the available events separate from the actual use of the perfevent event? Or is the problem that pcp expects that the metric to be setup and readable (pmprobe)?

The lack of a definitive event information dictionary has been a problem. Each of the tools using the performance monitoring hardware has its own. Kernel perf, papi, libpfm, and oprofile all have their own list of available events. Also as mentioned there are various pieces of information missing (the raw event number and which domain the event monitors (per logical processor thread/per cpu/per socket)). Some of that information might be encoded internally in the lists, but isn't made available outside the tool.

"perf list --details" provides the cpu/umask and event number information that would be needed by the perf event pmda. The "-v" option provides the more detail event description. It might still be troublesome to extract information from the unstructured text. Also would have to be careful handling cases where /sys/bus/event_source/devices describes the same events with slight different names.

hkshaw1990 commented 6 years ago

Limiting the list of probed events because perfevent runs out of file descriptor is a problem and it is likely to become worse as the number of available cores in machines increase. Isn't the discovery of the available events separate from the actual use of the perfevent event?

Well, it was. I am guilty of making it a problem of the perfevent agent :). You see, dependencies on other libraries made this agent hugely coupled with those event databases. And, as you already mentioned, adding a new event becomes a maintenance issue and is labour intensive.

Or is the problem that pcp expects that the metric to be setup and readable (pmprobe)?

I have to explore pmprobe, but as the manpage suggests, offloading the perfevent metrics' discovery to pmprobe could be a good alternative.

"perf list --details" provides the cpu/umask and event number information that would be needed by the perf event pmda. The "-v" option provides the more detail event description. It might still be troublesome to extract information from the unstructured text.

Yeah so, I would still not recommend parsing the text off of perf list [--details]. I agree that we could avoid duplication of code (reading the events info from the json files). But depending on another command (which might fail due to some reason) doesn't seem like a good idea. And if tomorrow, the output format of perf list [--details] changes, the parser will have to change. I would go with taking the event info from the linux's json files and follow perf's approach. Though, I could be wrong.

Also would have to be careful handling cases where /sys/bus/event_source/devices describes the same events with slight different names.

True, we can probably avoid this issue by indexing the events using their event codes (/sys/bus/event_source/devices//events/).

performancecopilot / pcp

Use "perf list" information rather than a separate perfevent.conf for the perfevent pmda #411