Request for clarification: Behaviour of PMU overflow interrupts when subset of counters assigned to guest

In the scenario where hardware performance monitoring (HPM) counters are shared between hypervisor and guest VM.

Configure hcounteren to assign a subset of HPM counters to a guest virtual machine.
Enable delegating overflow interrupt handling to the guest VM by writing a 1 to bit 13 hideleg. At this point, the LCOFI will be delivered to VSIP.
Configure counters to count events in the hypervisor and the guest

Section 18.2.2 describes the behaviour of vsip and vsie when the interrupt is delegated. But it is not clear what happens when the counters assigned to the hypervisor overflow.

For correctness, the interrupt should be delivered to the hypervisor where it can take appropriate action.

Maybe I am missing something obvious but I am not able to find the relevant behaviour described in the specification. Please help clarify the behaviour in the described scenario.

With just the Sscofpmf extension and the underlying base counter-related arch functionality, either all LCOFI interrupts are delegated down or none are. To do selective delegation one would have all LCOFI interrupts go to the hypervisor and then it would delegate selected LCOFI's down to a given guest using AIA's interrupt filtering features.

Thank you @gfavor for the reference to the AIA's interrupt filtering features.

From what I had seen earlier, it is possible to have bit 13 of hideleg be 0 and then use hvien and hvip registers to pass on the interrupts to the guest.

Taking this approach would mean that every hpm overflow interrupt will

trap to the hypervisor (context switch).
The hypervisor will need to disambiguate whether the interrupt targets a counter that belongs to the guest vs itself (scountovf vs hcounteren for guest)
Delegate the interrupt if targeting a guest HPM.
Continue execution in the guest

This flow incurs significant overhead for guest HPM usage - especially when used for profiling. E.g., perf defaults to sampling frequency of 4khz.

To improve on the performance overhead in mixed scenarios where subsets of counters are used by the hypervisor / guest - would it be possible to have LCOFI be delivered to the appropriate mode? The implementation has all the relevant information available via hcountern and scountovf. Delivering the interrupt to the owning component make be a give a significant benefit in multi-tenant hypervisor platforms.

Hi Puneet,

On Tue, Jul 9, 2024 at 10:28 AM Punit Agrawal @.***> wrote:

Thank you @gfavor https://github.com/gfavor for the reference to the AIA's interrupt filtering features.

From what I had seen earlier, it is possible to have bit 13 of hideleg be 0 and then use hvien and hvip registers to pass on the interrupts to the guest.

Taking this approach would mean that every hpm overflow interrupt will

trap to the hypervisor (context switch).

The hypervisor will need to disambiguate whether the interrupt targets a counter that belongs to the guest vs itself (scountovf vs hcounteren for guest)

Delegate the interrupt if targeting a guest HPM.

Continue execution in the guest

This flow incurs significant overhead for guest HPM usage - especially when used for profiling. E.g., perf defaults to sampling frequency of 4khz.

To minimize the overhead, we have added additional paravirt operations such as SBI PMU snapshot which allows us to do bulk access without trapping. Here are the patches that got merged recently.

@.***/

However, the LCOFI has to be delivered to the host as you pointed out due to the shared usage.

To improve on the performance overhead in mixed scenarios where subsets of counters are used by the hypervisor / guest - would it be possible to have LCOFI be delivered to the appropriate mode? The implementation has all the relevant information available via hcountern and scountovf. Delivering the interrupt to the owning component make be a give a significant benefit in multi-tenant hypervisor platforms.

That would restrict the use case where the host wants to use HPM to monitor itself or the guests or guest migration. The counter virtualization approach is pretty much standard in all other major architectures(ARM64,x86) today. There are recent talks about passthrough vPMU in x86 to minimize the overhead you are talking about[2]. It's still early for the series to be merged. The obvious downside is that you lose access to the HPM usage in the host and migration. That's why the passthrough implementation is a choice that CSPs can provide for their customer if required.

However, there is a lot more overhead in x86 than the RISC-V approach as they have to trap on every HPM access as well apart from hypervisor injecting the interrupt to the guest.

[2] @.***/

— Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-isa-manual/issues/1508#issuecomment-2218282531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CNHYZ6B3B3MJXVUMPCDTZLQMSNAVCNFSM6AAAAABKQ5P2CGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJYGI4DENJTGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Regards, Atish

Thank you for following up on the query @atishp04. Apologies for the delay in getting back - I needed sometime to get familiar with somethings you refer to.

Incidentally, github has quote escaped the patch references in your message for some reason. Hope I managed to find the correct references.

To minimize the overhead, we have added additional paravirt operations such as SBI PMU snapshot which allows us to do bulk access without trapping. Here are the patches that got merged recently.

I believe you are referring to this patchset.

IIUC, with this patchset the overflow interrupt will be taken in the host. The host / hypervisor will then update the shared memory with state of all the counters being used by the guest.

To improve on the performance overhead in mixed scenarios where subsets of counters are used by the hypervisor / guest - would it be possible to have LCOFI be delivered to the appropriate mode? The implementation has all the relevant information available via hcountern and scountovf. Delivering the interrupt to the owning component make be a give a significant benefit in multi-tenant hypervisor platforms.

That would restrict the use case where the host wants to use HPM to monitor itself or the guests or guest migration.

I don't understand the part about the host not being able to use HPMs.

For the discussion, let me outline what I'm suggesting -

Based on hcounteren, the system already knows which counters are assigned to the guest
When a counter overflows, the interrupt instead of being delivered to the host or the guest unconditionally, needs to be delivered based on the assigned state in hcounteren.

With this, it is possible to avoid all SBI calls related to guest PMU usage. From a software point of view, do you see any problems with this scheme?

The counter virtualization approach is pretty much standard in all other major architectures(ARM64,x86) today.

There are recent talks about passthrough vPMU in x86 to minimize the overhead you are talking about[2]. It's still early for the series to be merged. The obvious downside is that you lose access to the HPM usage in the host and migration. That's why the passthrough implementation is a choice that CSPs can provide for their customer if required.

Sorry your reference got obfuscated by github. Are you referring to this patchset?

riscv / riscv-isa-manual

Request for clarification: Behaviour of PMU overflow interrupts when subset of counters assigned to guest #1508