imxrt-rs / imxrt-hal

Rust for NXP i.MX RT
Apache License 2.0
122 stars 29 forks source link

Refactor LPSPI with word packing, futures #157

Closed mciantyre closed 1 month ago

mciantyre commented 2 months ago

Instead of using LPSPI continuous transactions, we pack the user's data into u32 words. The primitives for translating the user's data to / from the data FIFOs should help us as we consider async LPSPI drivers. And, in order to implement embedded-hal 1.0 traits, we'll need something like the dummy transmit / receive helpers, since we need to handle differing transmit / receive buffer sizes. The included unit tests don't trigger an error in Miri, and they try to simulate how we'd use the primitives in firmware. (This is an "it's not obviously wrong" test, not an "it's correct" test; help me review here.)

The commit introduces spinning futures into the LPSPI driver. By combining and spinning on these futures, we can realize in-place transfers, read-only transactions, write-only transactions, etc.

We no longer return the Busy error; we'll wait for transmit FIFO space. We also never return the NoData error, instead returning success when there's no I/O to do. Since this commit is a non-breaking change, the two errors are still available in the error enum. I'll remove them later.

I'm moving the blocking SPI example into RTIC and rewriting the driver test. The tests demonstrate overlapping writes, writes with flushes, and in-place transfers with a physical loopback. There's also tests that show how word sizes and bit orders interact. I'd appreciate if folks could test these changes in their system, since it affects how the embedded-hal implementations behave. I'm only testing this commit on a 1170EVK with the new example.

mciantyre commented 1 month ago

Thanks for the feedback. One thing I noticed about this change: without a spin on the busy flag, this driver reduces the delay between transfers, or the time that chip select is deasserted between subsequent transactions. Given a 1MHz CLK, I see a reduction from 1430ns to 530ns for that delay.

If your workaround includes a spin on the busy flag, if you're bursting write(...)s, and if you were previous at your SPI display's minimum delay between transfers, then this could affect you.


There's some refactoring we could do to reclaim code space. I pushed a commit to show those changes. I agree that it's still not optimal; in particular, defining transfer with a try_join combinator doesn't get close to what might be ideal.

It's convenient (and fun) to define I/O like this, but I'm happy to drop it if folks need that space. mciantyre/imxrt-hal@ab053df4fb77b1a95d422c9e81b41c71ad99951b isn't applied here, but it drops the futures from this branch.

Florob commented 1 month ago

Thanks for the feedback. One thing I noticed about this change: without a spin on the busy flag, this driver reduces the delay between transfers, or the time that chip select is deasserted between subsequent transactions. Given a 1MHz CLK, I see a reduction from 1430ns to 530ns for that delay.

That's interesting. I had tested the EVK yesterday and thought I saw the opposite result. So I re-tested and it turns out that inter-transfer gaps can be really strange at higher clock frequencies (I had 11 MHz). There are sometimes gaps in the order of multiple microseconds. Toggling a GPIO between each write() call indicates that in these cases there is actually more time spend in the write() call. However, that is AFAICT not a regression, and there are interactions with I-cache that might cause this, so it's probably fine. The implication being that this is absolutely fine on i.MX RT 1021 and at least as far as bus performance goes does not appear to have a negative impact. Rather spinning on FIFO space instead of Busy seems to be an improvement.

If your workaround includes a spin on the busy flag, if you're bursting write(...)s, and if you were previous at your SPI display's minimum delay between transfers, then this could affect you.

Fortunately (because that would have been virtually impossible to fix) it's likely something else. For one the minimum allowed transfer gap is specified as 40ns, but I've also realized something else: SPI MIPI interfaces typically have a separate D/C (data/command) pin, which is used to indicate what the data on the SPI bus means. This requires the SPI transfers to be blocking, or at least the ability to flush. That does not entirely explain why spinning for Busy on subsequent write() calls worked, but fundamentally brings us back to a variation on the original bug report: Users of e-h 0.2 expect the trait's methods to block. e-h 1.0 allows not blocking for the SpiBus trait, adding flush(). The SpiDevice trait's methods are however still blocking (implicitly flush).

There's some refactoring we could do to reclaim code space. I pushed a commit to show those changes. I agree that it's still not optimal; in particular, defining transfer with a try_join combinator doesn't get close to what might be ideal.

Interesting, that having two spin_on() (one via wait_for_fifo_space()) calls per operation reduces code size. I assume that is partially due to the inliner refusing to inline the wait_for_fifo_space().

It's convenient (and fun) to define I/O like this, but I'm happy to drop it if folks need that space. mciantyre/imxrt-hal@ab053df isn't applied here, but it drops the futures from this branch.

It's probably better to worry about this later. Having the bug fixed and e-h 1.0 support is more important than code size at least to me. If this turns out to be a real problem we can still try to optimize further down the line.

mciantyre commented 1 month ago

SPI MIPI interfaces typically have a separate D/C (data/command) pin, which is used to indicate what the data on the SPI bus means. This requires the SPI transfers to be blocking, or at least the ability to flush. That does not entirely explain why spinning for Busy on subsequent write() calls worked, but fundamentally brings us back to a variation on the original bug report: Users of e-h 0.2 expect the trait's methods to block.

I keep forgetting folks want to synchronize external components with the LPSPI through these eh02 interfaces. The latest commit adds a flush at the end of writes and transfers, striving for approach 1 in #136. I'm also not sure why a busy spin in the subsequent write worked, but a concluding flush should provide better guarantees.

Any other thoughts on these changes? Otherwise, I plan to release this into the 0.5 series for more user testing.

Florob commented 1 month ago

I keep forgetting folks want to synchronize external components with the LPSPI through these eh02 interfaces.

Obviously so do I, even though I'm technically one of these folks…

Any other thoughts on these changes? Otherwise, I plan to release this into the 0.5 series for more user testing.

Sounds and looks good to me.

mciantyre commented 1 month ago

Thanks for the reviews!

I guess if you move to an async API some form of completion notification (interrupt) would be needed to mark a Future<> complete is that right? Is there an interrupt for FIFO empty/FIFO watermark/FIFO full events? I'd guess so?

Yup, the LPSPI supports watermarks and interrupt events for both FIFOs. We could use those to drive async I/O.