LPSPI Rework Open Discussion Points

Finomnis commented 6 months ago

[ ] Architectural questions
- [ ] Split into low-level and high-level driver? If so, naming?
- [ ] Reuse existing driver as low-level driver? If so, how to deal with the fact that e-h 1.0 doesn't use the CS pin?
- [ ] Build high-level driver from a read and a write backend, which can be Polling/DMA/Interrupt based? If so, then:
- [ ] Should the DMA backend be an extension of the Interrupt backend, or should DMA work without interrupts? (Personally leaning towards DMA+Interrupts)
- [ ] Which data types should be allowed for DMA? u32 works always, u16 works never, and u8 works sometimes, depending on alignment (even if we align with slice::align_to. The read and write buffer could be misaligned to each other, making it impossible to DMA.)
- [ ] How should interrupts be handled? If interrupts are enabled, then interrupts get fired continuously until the interrupting flags are cleared. Should:
- we clear the flags and store their state elsewhere, then wake the waker?
- we disable interrupts inside the interrupt handler, and enable it dynamically whenever we go to sleep? Then all the interrupt handler does is to wake the waker`? (Kind of leaning to towards this one)
[ ] How should embedded-hal and embedded-hal-async be implemented? Should embedded-hal simply be a cassette poll wrapper around the async version or be its own thing? (problem: implementation complexity, which reduces confidence in correctness)
- Leaning towards yes, use cassette. Reason: Even in the blocking version, async makes it trivially easy to listen to the write & read side simultaneously, because it will simply poll both sides of the operation alternatingly. Without async, the read and write operations whould have to be interleaved somehow. That is, if we manage to get rid of the critical section in the poll function; this one is kind of a problem for the sync version.
[ ] Should critical sections be allowed inside of the interrupt handler? Should critical sections be held as short as possible?
[ ] How to combine u8/u16/u32 without having to provide a separate implementation for each?
- Current approach: convert them all to * u8 and store alignment information separately. Then split them into 6 parts:
- 1 to 3 bytes single word read&write transfer
- u32-aligned longer read&write transfer
- 1 to 3 bytes single word read&write transfer
- 1 to 3 bytes single word read or write transfer (depending on whether read or write data was longer
- u32-aligned longer read or write transfer
- 1 to 3 bytes single word read or write transfer
- This allows us to implement all other functions independent of the actual data type, and also automatically allows efficient u32 based u8/u16 transfer.
[ ] Should the implementation be reference or raw pointer based?
- Advantage of reference based: no unsafe required, and hence increased confidence in correctness
- Drawback of reference based: in-place transfer (required by eh1) becomes infinitely harder to implement.

Finomnis commented 6 months ago

@mciantyre In case you have some time, I'd appreciate your though on this, or if you have additional open questions I missed. I'm sure I missed plenty that will pop up later.

mciantyre commented 6 months ago

Split into low-level and high-level driver? If so, naming?

I have some ideas for naming the async components here. These names depends on the design, so if that design doesn't work, we'll think of something different.

Other ways to re-use the Lpspi name for different things include

putting them in different modules within imxrt-hal.
putting them in different crates.

Regarding 2: it would be OK with me to consider an imxrt-hal-async package. That package could depend on imxrt-hal for its drivers and configuration modules. This would be symmetric to the interface packages: we have embedded-hal and embedded-hal-async, and embedded-hal-async depends on embedded-hal.

Reuse existing driver as low-level driver? If so, how to deal with the fact that e-h 1.0 doesn't use the CS pin?

After removing the CS pin from the interface, I believe today's LPSPI driver can be a foundation for an async driver. It needs tweaks and extensions, but its lower-level API tries to support this use case. If the existing driver doesn't work, then reach for the RAL.

We'll need to remove the chip select pin from the driver, turning it into something supporting SpiBus. Users who want software-managed chip selects can reach for embedded-hal-bus. Users who want hardware-managed chip selects will need to wait for an LPSPI-capable SpiDevice solution.

Build high-level driver from a read and a write backend, which can be Polling/DMA/Interrupt based?

DMA + interrupts should work. Only supporting u32 is also fine; that's all we support today.

How should interrupts be handled?

Some status flags, like RDF and TDR, are read-only and cannot be cleared. Disabling the interrupt enable bits should work for these bits, and should also work for all other events. This allows a custom future to check and maintain the status register in its poll() implementation.

How should embedded-hal and embedded-hal-async be implemented? Should embedded-hal simply be a cassette poll wrapper around the async version or be its own thing?

I'm not a fan of "use cassette to go from async to blocking" for two reasons:

We have a blocking implementation that isn't based on async, and it can be tweaked to meet 1.0 embedded-hal requirements. Taking this approach excludes maintainers / contributors who only care about blocking I/O and aren't prepared for async Rust. I'm willing to maintain a blocking I/O implementation that's distinct from its async implementation.
It's a general design pattern baked into a specific HAL. If folks think this is a useful design pattern, how about doing something like embedded-spinny, which might solve the same problem for other HALs?

Should the implementation be reference or raw pointer based?

Raw pointers are OK. We can build our own / seek out existing safe abstractions.

Any considerations for introducing an intermediary (ring) buffer within an async LPSPI driver? Let the user decide the buffer size when they create their async LPSPI driver. We use it as another FIFO for the interrupt case, and as the source / destination of DMA transfers. We incur extra copying and buffering in the driver, and the user pays in extra program memory.

The extra buffer might change where / how we interact with the hardware FIFO, interrupt flags, status registers; not sure if that change is helpful. And if we require that the user supply us with a u32 buffer, we can then implement u8-based DMA without worrying about alignment.

Finomnis commented 6 months ago

Any considerations for introducing an intermediary (ring) buffer within an async LPSPI driver?

What would be the advantage of this, opposed to streaming into the hardware fifos directly?

mciantyre commented 6 months ago

For DMA use-case: If we require that the user supply us with a u32 buffer, we can then implement u8-based DMA without worrying about buffer alignment.

For interrupts, we can reduce the number of executor wake-ups (different than the number of activating interrupts). We achieve this by moving hardware FIFO management into the interrupt handler. The user yields at await points when they need space in their flexibly-sized ring buffer; they're not awaiting space in a fixed-sized hardware FIFO. Thread-safe, lock-free queues / buffers from heapless / bbqueue might help us out.

Reducing the number of executor wake-ups might avoid cooperative multitasking issues. If we're managing the hardware FIFOs through code paths in the executor, we could over- or under-run a FIFO while another task isn't yielding. This might be less of a concern; I think executors like RTIC and embassy have ways to express task priorities. It's also not a perfect solution, since avoiding the problem still depends on the ring buffer's size (but at least the user controls that size).

And for both cases, we have a consistent way to realize interrupt and DMA I/O. In the transmit path, we're filling a ring buffer, then triggering some state machine to empty that buffer. In the receive path, we're triggering some state machine to fill a ring buffer, then emptying that buffer. The distinction of interrupt and DMA is the answer to "what's that state machine on the other side of this buffer?" Once we establish the design, it naturally translates into an async LPUART driver implementation, supporting a peripheral that only has a four byte hardware FIFO.

Finomnis commented 6 months ago

Uff, that's a lot to think about :D I might have to take a step back for a couple of weeks, if someone wants to try himself on this, please go ahead :)

imxrt-rs / imxrt-hal

LPSPI Rework Open Discussion Points #147