rust-embedded / embedded-hal

A Hardware Abstraction Layer (HAL) for embedded systems
Apache License 2.0
1.94k stars 197 forks source link

[Discussion] Short delays < 1us #63

Open kunerd opened 6 years ago

kunerd commented 6 years ago

I came to a point where I need some ns delays, which can't be achieved by using the Delay trait. So I have done a bit of research and found that I am not the only one having this requirement. My exact use case is timing GPIO pins from within a embedded-hal driver.

The user nagisa suggested to use DMA to control the pins on #rust-embedded. But, unfortunately not every MCU comes with DMA support.

Another idea is to use an implementation similar like the one in the Linux kernel. It basically works by determining the loops_per_jiffy at start-up. The loops_per_jiffy value describes how often a given "do nothing" instruction can be called in a given amount of time. This value is then used to calibrate the ndelay function.

I don't know how reliable such an solution would be, but I know that it comes with some limitations like that it works on an approximately base and should only be used for very small delay, because of overflows.

What do you think, would such an implementation make sense in embedded-hal? Do you know better solutions for this kind of problem?

therealprof commented 6 years ago

Huh? Why DMA? DMA is "Make it fast and don't bother me with piecemeal stuff". For precise timings one would use interrupts which is very much incompatible with ndelay and all aspects together one reason why a RasPi with Linux sucks for timing critical work, or "hard realtime" as we'd call it...

If you want it simple (and your Cortex-M MCU supports it) you can easily setup a SysTick timer that will call an interrupt handler at a specific rate. Unfortunately setting up interrupts is an annoying and tedious task that embedded-hal will not help you with but it certainly can be done, cf. https://github.com/therealprof/stm32f042-hal/blob/master/examples/flash_systick.rs

austinbes commented 6 years ago

@therealprof It sounds like some bit-banged protocol. One way to do that with precise timing is DMAing data to a GPIO output register, triggering each transfer with a timer.

I'm of the view that there's a place for extremely short (micro-, maybe nanosecond) polled delays in embedded-hal. However, @kunerd, keep in mind that many MCUs don't run at a high enough frequency to get real nanosecond-scale resolution with that kind of implementation.

Whether the delay loop is calibrated each startup or calculated based on the MCU's known clock frequency is a question for individual implementations, I think -- the question that's salient for this crate is what the interface might look like.

therealprof commented 6 years ago

@austinbes

It sounds like some bit-banged protocol. One way to do that with precise timing is DMAing data to a GPIO output register, triggering each transfer with a timer.

GPIOs usually don't support DMA at all since the minimum data size a DMA engine will handle is (best case) 8 bit. Sure you might come up with clever tricks to make it work (e.g. by using setting up some serial protocol that works similarly to your data bus and then shapeshifting your data into the protocol words and bulk dumping it somewhere for the DMA engine or by DMAing with memory-to-memory transfers into a bit-banded memory area; if the MCU supports bit-banding and DMAing into that region, that is) but that's not THE use-case for DMA.

The easiest and most precise way is to simply use a timer and interrupts, that's exactly what they're there for. Another way would be a timer, events (if your MCU supports events and the data you want to measure is eventable) and interrupts.

kunerd commented 6 years ago

@kunerd, keep in mind that many MCUs don't run at a high enough frequency to get real nanosecond-scale resolution with that kind of implementation.

I know, the implementation I have in mind would work by delaying for the given amount of time at least, e.g. you provide it a lower bound. Basically you need to round up based on the minimal resolution the MCU's timing allows.

@therealprof But, I can't setup interrupts from within a driver. So, what's the current solution for such problems using embedded-hal? Do I have to provide some kind of tick interface?

therealprof commented 6 years ago

@kunerd True, interrupts always need to be set up by the application and the driver can not make any assumption about the availability or use of interrupts. Best thing is we could add interfacing to simplify the setup.

I'm not quite sure what you need but having a driver rely on hard realtime aspects sounds like a recipe for desaster. What I can image is that the driver just mandates that the user calls a method within a certain time window and let the user figure out how to meet those timing requirements.

austinglaser commented 6 years ago

@therealprof

GPIOs usually don't support DMA at all since the minimum data size a DMA engine will handle is (best case) 8 bit. Sure you might come up with clever tricks to make it work ...

Certainly. With that said, I reject the idea that it's only worthwhile to talk about the primary use-case for DMA, and some of these "clever tricks" (on the STM32 specifically, you can use the BSRR GPIO registers to manipulate individual pins) are exactly what the OP was asking about.

OTOH, these approaches will have to be tailored on a per-MCU basis, and won't be portable, so I suppose it's not worth worrying about in any case.

therealprof commented 6 years ago

@austinglaser

Certainly. With that said, I reject the idea that it's only worthwhile to talk about the primary use-case for DMA, and some of these "clever tricks" (on the STM32 specifically, you can use the BSRR GPIO registers to manipulate individual pins) are exactly what the OP was asking about.

Sorry, I don't follow. Neither do I see what individual GPIO changing (which all of the available stm32 hal implementations already use) has to do with the quest to precisely time the changes nor what this has to do with DMA

OTOH, these approaches will have to be tailored on a per-MCU basis, and won't be portable, so I suppose it's not worth worrying about in any case.

Yes, that's what I said: Drivers are supposed to generic so one cannot rely on hardware specifics.

kunerd commented 6 years ago

@therealprof I'm working on a HD44780 display driver, which uses a parallel connection. I am not sure how common similar interfaces and protocols are in embedded world. Maybe I just need to re-think and re-define the border between my driver and the concrete hardware dependent implementation.

therealprof commented 6 years ago

@kunerd Parallel interfaces are indeed not quite common anymore but I don't see any blocker implementing that interface.

30 seems useful to have for that use case.

I just briefly checked the data sheet so I might have missed something but why do you need precise high resolution timing to drive this display? From my reading, all of the data is buffered and cached internally specifically to allow interfacing with any lowly MCU which makes this an almost perfect use case for timer-independent operation.

kunerd commented 6 years ago

@therealprof There are some min. timing bounds between 5 - 1000 ns described in the figures and tables at the end of the datasheet. I know that most (if not all of them) are on 'wait at least' basis, but I was curious about, if there would be better solution for such small delays (in most cases only a couple of cycles).

therealprof commented 6 years ago

I thought those were all "wait for busy signal to change"? You can just busy wait for that.

kunerd commented 6 years ago

@therealprof: The busy-flag only works for whole instructions, at least in 4-bit mode you need to send two chunks of data for one instruction and thus also to hold things such as enable pulse width (> 450 ns). I think I need to first try these things out and see if that's really a problem. Anyway thanks for your help.

Reference: https://www.sparkfun.com/datasheets/LCD/HD44780.pdf p. 33