Open dpgeorge opened 1 month ago
Regarding the ldaexb/strexb pair setting the processor event flag: I tested these instructions (the exact SW_SPIN_LOCK_LOCK
code from this repo) on:
On both these MCUs the ldaexb/strexb pair do not set the processor event flag on the processor executing these instructions.
So it seems to be a quirk of the RP2350 -- which has the processor event flags cross-wired between the CPUs -- that the ldaexb/strexb pair set the event flag on the current processor.
Speaking just to the monitor behaviour:
Regarding the ldaexb/strexb pair setting the processor event flag: I tested these instructions (the exact
SW_SPIN_LOCK_LOCK
code from this repo) on:* a different Cortex-M33 MCU (an STM32H5xx) * a multi-core Cortex-M55 (an Alif E7).
On both these MCUs the ldaexb/strexb pair do not set the processor event flag on the processor executing these instructions.
This is specified behaviour on Armv8-M, see DDI0553B.y section B9.3.1 (page 275):
We implemented this (useless) behaviour because it is strictly required by the spec. I haven't looked into the intricacies of the global monitor implementations on the systems you mentioned. It's possible this is their bug, and they missed that part of the spec. It's also possible they don't retire reservations on a PE's exclusive store to its own reservation, which is implementation-defined as per B9.3.2:
We do take this arc, as per 2.1.6.1 "Implementation-defined Monitor Behaviour" in our datasheet:
I think the intended use here is to leave a reservation open and get pinged when someone writes to it, but it causes unnecessary events on RMW.
Background
We are trying to achieve good idle power consumption on RP2350 in MicroPython. Idle here means sitting at the REPL (with either UART or USB serial) doing nothing. In this case the CPU should be able to gate its clock via WFI/WFE to reduce power consumption.
Eg on a standard Pico with RP2040, connected to a USB port and a terminal port program open, the Pico draws about 15.2mA (at 5V). Then running a simple while-True busy loop that increases to about 18mA. This is expected behaviour.
Summary of problems
Things seem to have changed in SDK 2.0.0 (the tag 2.0.0). There are a few issues that seem to be related:
best_effort_wfe_or_timeout()
now seems to return immediately, on both RP2040 and RP2350best_effort_wfe_or_timeout()
calls__sev(); __wfe()
to clear any existing event, which potentially misses eventsldaexb
andstrexb
exclusive-access instructions, and these seem to set the event flag (equivalent to__sev()
), meaning thatspin_lock_blocking
sets the eventhardware_alarm_set_target
set the event flag, making these functions unusable for making a timer to wake from WFEDetails
best_effort_wfe_or_timeout()
now seems to return immediatelyRunning the following on RP2040:
With pico-sdk 1.5.1 that will consume about 15.2mA on a Pico board. With pico-sdk 2.0.0 that code consumes about 18mA on a Pico board.
Timing how long the
best_effort_wfe_or_timeout
function lasts, on pico-sdk 2.0.0 it always returns pretty much immediately (eg within 5us), so the above loop is effectively a busy loop, hence the higher power consumption.This looks like a regression with pico-sdk 2.0.0 on RP2040.
best_effort_wfe_or_timeout()
calls__sev(); __wfe()
Inspecting the code for
best_effort_wfe_or_timeout
in pico-sdk 2.0.0, there's a new bit:This sev/wfe pair will clear any existing event flag. But what if that event was from a user interrupt, and was the event the user was waiting for? Eg:
In principle (I don't have code to show this behaviour) the
__sev()
from themy_irq_handler
will be cleared by thebest_effort_wfe_or_timeout
and that latter function will wait the entire 10ms if no other IRQs come in.ldaexb
andstrexb
set the event flagAccording to the datasheet for RP2350:
From my testing it seems that executing a pair of
ldaexb
andstrexb
instructions on RP2350 does indeed do an effective__sev()
. This meansspin_lock_blocking()
sets the event flag, and hence any function that calls this.It would be great to instead use the hardware spin locks on RP2350. According to the errata it's still possible to use some of them, those which don't have aliases for writable registers.
hardware_alarm_set_target
set the event flagIn MicroPython we implement low power idle by setting up a callback using
hardware_alarm_set_target(<id>, <timeout>)
and then execute__wfe()
to either wait for an event or the timeout. But this does not work on RP2350 becausehardware_alarm_set_target()
sets the event flag, meaning that the subsequent__wfe()
wakes immediately.It's unclear how to use
__wfe()
effectively in the pico-sdk because of this issue of the event flag being set in many locations.Final thoughts
Ideally we'd be able to use
__wfe()
to implement low-power idle on RP2350. If anyone has any pointers on how to do this that would be much appreciated.Regardless, I think there are a few bugs with
best_effort_wfe_or_timeout
as mentioned above.