Using `Delay` via shared references

tgross35 commented 1 year ago

This is mentioned in https://github.com/rust-embedded/book/issues/246: what is the recommended way to share a Delay?

Currently all the Delay methods take a mutable self reference, e.g. Delay::delay_x(&mut self, val: u32). This makes it pretty tricky to use a delay in more than one place since you only get one (at least in atsamd-hal). A couple possible solutions I see:

HAL crates make their Delay be cloneable (possibly !Send/!Sync in these cases) or otherwise provide a way to get >1 impl Delay objects
Users provide a impl Delay type that is a RefCell around a Delay. That is what I currently plan on doing, but it's kind of icky (10+ extra steps to borrow_mut a delay is deadly)
Change these function signatures to take &self rather than mut, and HAL crates internally keep a RefCell or something to make this work. Also icky, since your delays become failable.
Driver libraries always use Countdown instead of drivers

So, I'm just curious what the recommendation is

adamgreig commented 1 year ago

Thanks for opening this issue. We discussed it a bit in today's meeting (here).

The traits take a mutable reference because some implementations may need to ensure they have unique access or otherwise mutate stored data to implement the trait functions. I think ideally HALs should offer some kind of cloneable or otherwise multiply-obtainable Delays. In some frameworks like embassy there's a continuous timer and all the delays can simply check against that (and it's also used for power-saving by configuring interrupts at the right wake-up times). Most HALs aren't designed to provide something like that, though, but perhaps something similar - consuming a hardware timer to configure as free-running and then generating as many Delay instances off it as required - could be done by HALs.

We discussed possibly having cortex-m provide some examples for this using the SysTick peripheral and perhaps even the asm::delay method, so that there's some source of multiple Delays (at least on Cortex-M platforms).

For right now, either the RefCell wrapper, or writing your own Delay source using a HAL-provided timer, are probably your best immediate options. I don't think we'd want to change the traits to take &self, so in practice I think encouraging HALs to make multiple Delays obtainable is the best bet long term.

tgross35 commented 1 year ago

Thanks for the detailed response, the transcript was a good read. I'll bring up the issue on the HAL I use (atsamd). It seems like in general for drivers, the better option is to take a impl Countdown instead, to allow for potential nonblocking operation - is that accurate?

Countdown does run into a similar problem, at least in my case, since there are only two TimerCounters on my chip - so I'm by default limited to two peripherals that take a Countdown and one that takes a Delay. I'm not sure what the best way to address that is, since it's not systick based.

adamgreig commented 1 year ago

Countdown won't be in embedded-hal 1.0 at first, because most of the traits that deal with time have been removed until we can have a better abstraction for it; plus, most the nb-based traits are moving to embedded-hal-nb as we expect in the future most people will want either blocking or async, and it turns out nb doesn't really play well with async (hence the new embedded-hal-async).

My guess is that most drivers that just want to wait a millisecond here or there in generally-blocking operations will keep taking Delay which is perfect for that, and other drivers might instead use async and use the async Delay from embedded-hal-async. So, it probably doesn't make sense to swap to Countdown right now, I think.

tgross35 commented 1 year ago

That all makes sense, thanks again for the details.

tgross35 commented 1 year ago

@adamgreig do you by chance have an outline of how you would plan to do this from the cortex M side?

I brought up the idea to the atsamd repository https://github.com/atsamd-rs/atsamd/issues/670 but there was no clear way forward, so I'm curious what a good solution might look like.

Current implementation of delay on atsamd_hal: https://github.com/atsamd-rs/atsamd/blob/7d85efd1f0c8ce136a9bdcb96cc4d0fe672bed6b/hal/src/delay.rs#L52-L74

adamgreig commented 1 year ago

I had three thoughts in mind:

Make a new Delay provider that just uses a nop loop under the hood, like asm::delay does. You can create this as many times as you want, and it would take some scaling parameter to account for your particular hardware. It loses accuracy if there are interrupts that take away some time.
Variant on 1 but uses CYCCNT on hardware that supports it. Isn't affected by interrupts taking time unless it completely wraps. You only need to know the CPU clock frequency. The more I think about it the more I like it, but not all hardware has CYCCNT.
A struct that consumes SysTick (maybe with a release() method), configures it to run at some known rate, and then allows you to create as many new Delay objects as you want which only read the systick counter to work out how long has passed.

Maybe one or two or all of them could be provided. Any of them should let you have multiple Delays with varying tradeoffs. For a HAL, something like option 3 but using an actual hardware timer is best, I think. It "uses up" the timer to ensure it's always running at a known frequency, then each of the Delays just watches until enough counts have passed on the timer (handling wraparound as required).

tgross35 commented 1 year ago

Great, thank you for the direction. I will pass it on.

david-sawatzke commented 1 year ago

Make a new Delay provider that just uses a nop loop under the hood, like asm::delay does. You can create this as many times as you want, and it would take some scaling parameter to account for your particular hardware. It loses accuracy if there are interrupts that take away some time.

I don't think it's a good idea provide such an implementation due to how brittle it is. There are issues with flash prefetch on various MCUs where the timing depends on page alignment, flash region (as you mentioned) and other factors. An assembly-only loop in RAM is better, but still iffy.

adamgreig commented 1 year ago

Yea, but this isn't something a driver should be using - it's something the end application pulls in in lieu of anything better. With some docs suggesting what scaling factor you might want on different hardware and pointing out these troublesome aspects, I think it could still be a useful tool. Plus, on the higher end platforms that tend to have more variability in how long the loop might take, you probably have CYCCNT available, which is immune to those problems so could be used instead.

rust-embedded / embedded-hal

Using `Delay` via shared references #435