riscv-non-isa / riscv-sbi-doc

Documentation for the RISC-V Supervisor Binary Interface
https://jira.riscv.org/browse/RVG-49
Creative Commons Attribution 4.0 International
337 stars 88 forks source link

Unspecified interactions between HSM and other extensions #98

Closed smaeul closed 1 year ago

smaeul commented 2 years ago

The SBI spec should clarify how stopping/suspending one or all harts is allowed to affect the behavior of other extensions, as well as other architectural state.

For example, the timer extension:

Some further discussion (correct me if I'm wrong): https://lore.kernel.org/linux-riscv/bca6d76d-1708-fede-8008-d2588a14b656@mail.ru/T/

What about the PMU extension? Is hart suspend allowed to reset the PMU state?

avpatel commented 1 year ago

Whether the timer interrupt will fire during non-retentive suspend or not is a platform/implementation-specific detail which needs to be discovered from DT/ACPI (similar to what ARM has done). The SBI specification only defines mechanism to enter suspend states.

I had sent out a Linux patch to add HART DT property for this. Maybe we can move in that direction? (Refer, https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/)

Regards, Anup

smaeul commented 1 year ago

Not making any guarantees is fine with me, as long as everyone is okay with the implications. Namely, that in the absence of DT/ACPI properties, S-mode software must be able to work with the worst-case platform/implementation.

I assume then that it's okay to lose PMU counter and interrupt state in suspend as well. Who is responsible for saving/restoring that state, S-mode or M-mode?

palmer-dabbelt commented 1 year ago

this came up on IRC and LKML, but just to reflect things here: IMO it's fine if this is the desired behavior in the spec, we'll just need some DT properties so hardware can Ack into using non-retentive suspend from Linux.

smaeul commented 1 year ago

The non-retentive suspend states already need to be added to the DT by the firmware at runtime. So the presence of the idle-states nodes is already the hardware/firmware acking in to non-retentive suspend. (The availability of specific state encodings, and their latency properties, will vary between SBI implementations, so it would be inappropriate to include them in the static devicetree.)

Consider these three scenarios: 1) S-mode software does not use the SBI timer extension at all. It has a native driver for timer hardware that can wake the CPU from idle states. 2) S-mode software uses the SBI timer extension for its tick timer. The SBI implementation has a driver for hardware timer that can wake the CPU from idle states. 3) S-mode software uses the SBI timer extension for its tick timer. The SBI implementation has a driver for hardware timer that can not wake the CPU from idle states.

Given the existence of scenario 1, it would be wrong to omit the CPU idle states based on the behavior of the SBI timer extension. The property must allow S-mode software to distinguish between scenarios 2 and 3, so the OS can decide whether or not to use the CPU idle states advertised to it.

I just want to be clear that "does non-retentive suspend work?" is a very different question from "can the SBI timer extension be used in conjunction with non-retentive suspend?".

palmer-dabbelt commented 1 year ago

Sorry "does not-retentive suspend work" was just the wrong thing to say here: whether or interrupts are received is implementation-defined behavior, so it works either way (ie, it's defined that interrupts wake up the hart or arrival but implementation defined whether interrupts ever arrive). IIUC this applies to every interrupt source, so we'd need to define whether or not the interrupt arrives for case 1 as well -- presumably it's not the same as the when the SBI is interposing, as SBI implementations could convent non-retentive suspend states into retentive states when they see a timer is armed.

Though I guess that also leaves it a bit unclear whether or not interrupts must arrive during retentive suspend. I don't see anything in the spec that would require that, so presumably it's also implementation defined?

On Mon, Nov 7, 2022 at 3:44 PM Samuel Holland @.***> wrote:

The non-retentive suspend states already need to be added to the DT by the firmware at runtime. So the presence of the idle-states nodes is already the hardware/firmware acking in to non-retentive suspend. (The availability of specific state encodings, and their latency properties, will vary between SBI implementations, so it would be inappropriate to include them in the static devicetree.)

Consider these three scenarios:

  1. S-mode software does not use the SBI timer extension at all. It has a native driver for timer hardware that can wake the CPU from idle states.
  2. S-mode software uses the SBI timer extension for its tick timer. The SBI implementation has a driver for hardware timer that can wake the CPU from idle states.
  3. S-mode software uses the SBI timer extension for its tick timer. The SBI implementation has a driver for hardware timer that can not wake the CPU from idle states.

Given the existence of scenario 1, it would be wrong to omit the CPU idle states based on the behavior of the SBI timer extension. The property must allow S-mode software to distinguish between scenarios 2 and 3, so the OS can decide whether or not to use the CPU idle states advertised to it.

I just want to be clear that "does non-retentive suspend work?" is a very different question from "can the SBI timer extension be used in conjunction with non-retentive suspend?".

— Reply to this email directly, view it on GitHub https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1306380995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKRUXXE5YU4SGFAYY77WFDWHGH57ANCNFSM5ZVQ5OYA . You are receiving this because you commented.Message ID: @.***>

avpatel commented 1 year ago

@palmer-dabbelt Yes, for retentive suspend the SBI spec is very clear that HART state must be preserved by HW so interrupt will be observed by HART correctly in retentive suspend state.

Only the non-retentive suspend state gives HW flexibility to discard certain portions of HART state which in-turn allows more power saving.

In addition to timer (SBI timer or Sstc or platform specific), some platform might also lose state of per-HART IMSICs so this property also needs to be discovered from DT or ACPI (just like ARM world).

ConchuOD commented 1 year ago

I'm a little behind on this, so I am gonna do a little recap to make sure that I am on the right track... Things being split over different places hasn't helped.

The way the D1 is implemented, it does not get timer events during suspend. Samuel submitted a patch for Linux so that the clocksource driver would not assume that events are received during suspend. That then broke things for existing, SiFive based stuff where the implementation does actually receive events during suspend - eg the RCU stall detection not kicking in that I reported. The proposed solution to keep both parties ~happy~ functioning is a dt property. In the absence of this property, we fall back to covering for the worst case - that being events do not arrive during suspend. That would make the property "riscv,timer-can-wake-cpu" or whatever that gets bike-shedded to.

Anup made the point on his proposal:

This is a one-of-case with RISC-V DeviceTree where we are living with the fact that there is no timer DT node. If we add a timer DT node now then we have to deal with compatibility for existing platforms.

I think that this is a bit moot though, since surely we are already dealing with a compatibility issue for existing platforms? There are existing platforms, without this "riscv,timer-can-wake-cpu" property in their DT that are affected as is. Those users are going to have to add the property to continue functioning as-is (or more accurately, as-was) right? Have I missed something there?

That's all a long-winded way of asking what to do with the revert that I posted. It seems to me, although I am obviously biased, that we should actually apply the revert since it seems like every upstream user was okay with how things were prior to 32ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend"). When Anup's resolved the concerns of the dt folks and the property has been added, then we can add back in CLOCK_EVT_FEAT_ONESHOT as a default.

Is that a fair summary/proposal?

palmer-dabbelt commented 1 year ago

I'm a little behind on this, so I am gonna do a little recap to make sure that I am on the right track... Things being split over different places hasn't helped.

The way the D1 is implemented, it does not get timer events during suspend. Samuel submitted a patch for Linux so that the clocksource driver would not assume that events are received during suspend. That then broke things for existing, SiFive based stuff where the implementation does actually receive events during suspend - eg the RCU stall detection not kicking in that I reported. The proposed solution to keep both parties ~happy~ functioning is a dt property. In the absence of this property, we fall back to covering for the worst case - that being events do not arrive during suspend. That would make the property "riscv,timer-can-wake-cpu" or whatever that gets bike-shedded to.

Anup made the point on his proposal:

This is a one-of-case with RISC-V DeviceTree where we are living with the fact that there is no timer DT node. If we add a timer DT node now then we have to deal with compatibility for existing platforms.

I think that this is a bit moot though, since surely we are already dealing with a compatibility issue for existing platforms? There are existing platforms, without this "riscv,timer-can-wake-cpu" property in their DT that are affected as is. Those users are going to have to add the property to continue functioning as-is (or more accurately, as-was) right? Have I missed something there?

That's all a long-winded way of asking what to do with the revert that I posted. It seems to me, although I am obviously biased, that we should actually apply the revert since it seems like every upstream user was okay with how things were prior to 32ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend"). When Anup's resolved the concerns of the dt folks and the property has been added, then we can add back in CLOCK_EVT_FEAT_ONESHOT as a default.

Is that a fair summary/proposal?

That's really a Linux question. It's kind of independent to the spec discussions: we'd need to deal with the breakage either way, the question here is just what the spec requires and thus how we describe this in the code/dt (ie, do we say this is a D1 errata or do we say everyone else has custom extensions to make it work).

ConchuOD commented 1 year ago

ie, do we say this is a D1 errata or do we say everyone else has custom extensions to make it work

Right. In a roundabout way I was pretty much trying to make sure that that was the dividing line. Linux is just my frame of reference for it.

All other things being equal, I'd say that assuming you do not get events in suspend is the "safer" route given we have implementations out there that do both?

palmer-dabbelt commented 1 year ago

ie, do we say this is a D1 errata or do we say everyone else has custom extensions to make it work

Right. In a roundabout way I was pretty much trying to make sure that that was the dividing line. Linux is just my frame of reference for it.

All other things being equal, I'd say that assuming you do not get events in suspend is the "safer" route given we have implementations out there that do both?

I'd argue we should default to what the specs say, but it's the same conclusion here.

ConchuOD commented 1 year ago

I'd argue we should default to what the specs say, but it's the same conclusion here.

Aye, although from the earlier comments here & from my quick read of this repo there does not seem to be anything more specific than: "The hart will automatically come out of suspended state and resume normal execution when it receives an interrupt or platform specific hardware event" (emphasis mine).

Are there other, related, spec docs that may cover whether interrupts must arrive during suspend? Nothing in the PLIC spec from my cursory check.

ConchuOD commented 1 year ago

So, uh.. For this to actually go anywhere what needs to happen? As I said above, I don't see anything covering this "problem" in the specs & I've had reports come in about stuff like clock_nanosleep() being reduced to jiffy accuracy etc.

A v2 of Anup's stuff might make things work kernel-side, but this should really be clarified, right?

palmer-dabbelt commented 1 year ago

I don't think we need to do anything here, we just need to revert that patch in Linux. IIRC it's targeted at the clock folks and I already Ack'd it, but LMK I missed something.

ConchuOD commented 1 year ago

I don't think we need to do anything here, we just need to revert that patch in Linux. IIRC it's targeted at the clock folks and I already Ack'd it, but LMK I missed something.

Riiight - but restoring the old behaviour doesn't do Samuel & other D1 users any favours? Whether that's reverted or not, we still need a way to differentiate between those who do get clock events in suspend and those who do not, since the spec doesn't seem to describe whether or not an interrupt actually makes it to the hart?

palmer-dabbelt commented 1 year ago

I don't think we need to do anything here, we just need to revert that patch in Linux. IIRC it's targeted at the clock folks and I already Ack'd it, but LMK I missed something.

Riiight - but restoring the old behaviour doesn't do Samuel & other D1 users any favours? Whether that's reverted or not, we still need a way to differentiate between those who do get clock events in suspend and those who do not, since the spec doesn't seem to describe whether or not an interrupt actually makes it to the hart?

I think this should do it: https://lore.kernel.org/r/20221121205647.23343-1-palmer@rivosinc.com/

smaeul commented 1 year ago

I don't think we need to do anything here, we just need to revert that patch in Linux. IIRC it's targeted at the clock folks and I already Ack'd it, but LMK I missed something.

What? This directly contradicts your previous suggestion. If we "default to what the specs say", then 232ccac1bd9b is correct, and the DT/ACPI property goes on top of that to remove the flag.

Riiight - but restoring the old behaviour doesn't do Samuel & other D1 users any favours? Whether that's reverted or not, we still need a way to differentiate between those who do get clock events in suspend and those who do not, since the spec doesn't seem to describe whether or not an interrupt actually makes it to the hart?

I think restoring the old behavior is wrong, because it makes assumptions that the spec does not support, but it doesn't actually break suspend on the D1. Linux on D1 uses drivers/clocksource/timer-sun4i.c in preference to SBI timers for clock events, so it falls under "scenario 1" as described above.

I think this should do it: https://lore.kernel.org/r/20221121205647.23343-1-palmer@rivosinc.com/

That patch completely breaks a feature which currently works on "scenario 1" platforms, including D1, and would continue to work regardless of how we changed the RISC-V timer driver (modulo changing the rating such that it started being chosen over other drivers).

palmer-dabbelt commented 1 year ago

I don't think we need to do anything here, we just need to revert that patch in Linux. IIRC it's targeted at the clock folks and I already Ack'd it, but LMK I missed something.

What? This directly contradicts your previous suggestion. If we "default to what the specs say", then 232ccac1bd9b is correct, and the DT/ACPI property goes on top of that to remove the flag.

Sorry, I think I'd gotten lost in which patch was which. We can't just revert the patch, we need some other way of disabling non-retentive suspend until we can sort out how to support it. That's what I was trying to do with the patch.

Riiight - but restoring the old behaviour doesn't do Samuel & other D1 users any favours? Whether that's reverted or not, we still need a way to differentiate between those who do get clock events in suspend and those who do not, since the spec doesn't seem to describe whether or not an interrupt actually makes it to the hart?

I think restoring the old behavior is wrong, because it makes assumptions that the spec does not support, but it doesn't actually break suspend on the D1. Linux on D1 uses drivers/clocksource/timer-sun4i.c in preference to SBI timers for clock events, so it falls under "scenario 1" as described above.

I guess I don't understand what's actually broken, then? If the D1 (which I thought was what broke here?) isn't using the SBI timer driver then why is it breaking things? Having C3STOP unconditionally enabled breaks other platforms and doesn't seem like a sane way to go in general, given that it's aimed at a legacy x86 feature.

Maybe it's better to talk about this on LKML? IMO it's not really a spec question any more.

I think this should do it: https://lore.kernel.org/r/20221121205647.23343-1-palmer@rivosinc.com/

That patch completely breaks a feature which currently works on "scenario 1" platforms, including D1, and would continue to work regardless of how we changed the RISC-V timer driver (modulo changing the rating such that it started being chosen over other drivers).

The feature it breaks is retentive suspend? I agree with that, the patch just disables it until we have some way to make sure it works. If non-retentive suspend works on the D1 when not using the SBI timer then we can always add some sort of hook to check what platform we're on, but we shouldn't be triggering non-retentive suspend without knowing some platform specifics.

avpatel commented 1 year ago

Better to disable/remove non-retentive suspend states from DT on the platform where it is broken instead of disabling non-retentive suspend for all RISC-V platforms.

palmer-dabbelt commented 1 year ago

Better to disable/remove non-retentive suspend states from DT on the platform where it is broken instead of disabling non-retentive suspend for all RISC-V platforms.

Sorry, but what do you mean by "broken" here?

avpatel commented 1 year ago

I meant the remove suspend state DT entry for the platform where non-retentive suspend does not work currently.

palmer-dabbelt commented 1 year ago

I meant the remove suspend state DT entry for the platform where non-retentive suspend does not work currently.

In what way does non-retentive suspend not work on these systems? IIUC the spec says that interrupts might not arrive, so the platform is correct here. It's just Linux that's doing the wrong thing by expecting interrupts to be delivered, the C3STOP patch handles that for the timer interrupts (but breaks other stuff in timer land and IMO should still be reverted) but we don't have anything to tell Linux that other interrupts might not arrive.

So IMO it's really the Linux suspend driver that's broken here, not the platform.

avpatel commented 1 year ago

I agree, Linux suspend flow is broken but it is not the Linux suspend driver but the Linux timer driver which is broken so we need to fix the Linux timer driver.

The non-retentive suspend works fine on Allwinner D1 and QEMU virt machine but does not work on Microchip platform due to C3STOP flag in the RISC-V timer driver.

avpatel commented 1 year ago

BTW, the Linux suspend flow involves various drivers (timer, interrupt controller, cpuidle, etc) and various subsystems to collectively work. The Linux suspend flow can break if one of the things in Linux suspend flow does not work a particular platform but that does not mean the entire Linux suspend flow is broken for all platforms.

In our case, the RISC-V timer driver does not do the right thing for Microchip platform so we need to fix the RISC-V timer driver. The solution happens to be updating the DT bindings for discovering platform specific behavior of timer interrupts.

ConchuOD commented 1 year ago

The non-retentive suspend works fine on Allwinner D1 and QEMU virt machine but does not work on Microchip platform due to C3STOP flag in the RISC-V timer driver.

Better to disable/remove non-retentive suspend states from DT on the platform where it is broken instead of disabling non-retentive suspend for all RISC-V platforms.

FWIW, our DT has nothing in it related to cpu-idle, we're seeing problems with the timers working properly during normal operation.

Maybe it's better to talk about this on LKML? IMO it's not really a spec question any more.

Good idea.

atishp04 commented 1 year ago

AFAIK, this was resolved in lkml discussions resulting a dt property. Closing this issue.