RFC-16: New capability for the PMU

lsf37 commented 4 months ago

lsf37 commented 2 months ago

Status update from 2024-07-25 TSC meeting: awaiting update from Gernot; likely to be split into two RFCs, one for global one for thread-local PMU access.

Kswin01 commented 2 months ago

@Indanz Hi Indan, I'm finally getting back around to working on the RFC and had a question about a comment you made about providing a system call interface for accessing the PMU not being useful for timing sensitive register accesses. What would be some examples of these applications, where ~1000 cycles of delay would be noticeably detrimental(Outside of profiling)?

Indanz commented 2 months ago

What would be some examples of these applications, where ~1000 cycles of delay would be noticeably detrimental(Outside of profiling)?

The whole point of Performance Monitor Units is profiling, so outside of profiling: None.

Using PMU for thermal and energy management is an unusual use case. That said, for production having profiling on is unusual too and for benchmarking enabling KernelArmExportPMUUser is probably sufficient on ARM. It would be nice if it could be enabled and disabled at runtime via a capability, but not sure if more is needed.

Your proposed API seems very tailored to your needs, but it's unclear whether it's the best way to give access to the PMU in general, nor whether using PMU is the best solution for your problem. It's also unclear how it will interact with virtualisation.

On the upside, a syscall approach avoids any context switch slowdowns. But if doing a syscall based solution, I would prefer to keep it as simple as possible: Limit it to reading and writing PMU registers, and do the rest in user space if possible.

PMU gives timing info about all processes running, so making it fine grained on an event type basis via badging doesn't add security. Managing events can also be done in user space, the added value for doing it in the kernel is low.

Other than CPU utilisation, what are you using the PMU for? Total CPU utilisation you could collect with a low priority task instead of PMU. Does your platform have thermal PMU events? It seems like it does and that's why you want badging, so you can have critical thermal PMU events mixed with performance profiling.

seL4 / rfcs

RFC-16: New capability for the PMU #22