SPI: split into device/bus, allows sharing buses with automatic CS

rust-embedded / embedded-hal

A Hardware Abstraction Layer (HAL) for embedded systems

Apache License 2.0

2.01k stars 202 forks source link

SPI: split into device/bus, allows sharing buses with automatic CS #351

Closed Dirbaio closed 2 years ago

Dirbaio commented 2 years ago

Because why not?

This separates the concepts of "SPI bus" and "SPI device". I got the idea of having traits for SPI devices from #299. It has an excellent breakdown of the issues of the status quo, so go read that first.

The difference from #299 is that proposes "repurposing" the existing traits to represent devices, while this PR proposes specifying the existing traits represent buses, and introduces a new trait for representing devices on top of that.

HALs impl the bus traits, just like now. No extra boilerplate.
Crates like shared-bus add the "bus mutexing" logic, to impl SpiDevice on top of SpiBus (+ OutputPin for CS). There's a wide variety of possible impls, all can satisfy the SpiDevice requirements:
- Exclusive single-device, that just owns the entire bus
- Single-thread, on top of RefCell
- Multithreaded std, using std Mutex
- Multithreaded embedded, using some AtomicBool taken flag.
- Async embedded, using an async mutex (this'd require an async version of SpiDevice).
Drivers take T: SpiDevice (or SpiDeviceRead, SpiDeviceWrite), then they start a .transaction(), then use it to read/write/transfer data. The transaction ends automatically when returning. Example:

fn read_register(&mut self, addr: u8) -> Result<u8, Error> {
    let mut buf = [0; 1];
    self.spi.transaction(|bus| {
        bus.write(&[addr])?;
        bus.read(&mut buf)?;
    })?;
    Ok(buf[0])
}

No Transactional needed

Previous proposals boiled down to: for each method call (read/write/transfer), it automatically assets CS before, deasserts it after. This means two writes are really two transactions, which might not be what you want. To counter this, Transactional was added, so that you can do multiple operations in a single CS assert/deassert.

With this proposal, you have an explicit "Transaction" object, so you can easily do multiple reads/writes on it in a single transaction. This means Transactional is no longer needed.

The result is more flexible than Transactional, even. Take an imaginary device that responds to "read" operations with a status code, and the data only arrives on success. With this proposal you can easily read data then take conditional action. With Transactional there's no way, best you can do is read the data unconditionally, then discard it on error.

It allows for smaller memory usage as well. With Transactional you had to allocate buffer space for all operations upfront, with this you can do them as you go.

fn read_data(&mut self, addr: u8, dest: &mut [u8]) -> Result<(), Error> {
    self.spi.transaction(|bus| {
        bus.write(&[addr])?;

        // Read status code, 0x00 indicates success
        let mut buf = [0; 1];
        bus.read(&mut buf)?;
        if buf[0] != 0x00 {
            return Err(Error::ErrorCode(buf[0]));
        }

        // If the operation is successful, actually read the data, still in the same transaction
        bus.read(dest)
    })
}

rust-highfive commented 2 years ago

r? @eldruin

(rust-highfive has picked a reviewer for you, use r? to override)

ryankurte commented 2 years ago

hey another interesting option!

Because why not?

🤣

Drivers take T: SpiDevice (or SpiReadonlyDevice, SpiWriteonlyDevice), then they start a .transaction(), then use it to read/write/transfer data. The transaction ends automatically when dropped.

i guess you'd also have a fn done(mut self) to make it obvious how to explicitly close the transaction, or you'd have to document .drop().

With this proposal, you have an explicit "Transaction" object, so you can easily do multiple reads/writes on it in a single transaction. This means Transactional is no longer needed.

i believe this is achieved by both ManagedCS proposals, though it is still useful to have a mechanism for representing an arbitrary sequence of operations.

An alternative would be using a closure like in #350, but I think the "end transaction on drop" proposed here is better:

with this approach the CS is effectively scoped to the parent ~~function~~ scope rather than the transaction and requires explicit overriding with manual drop to maintain ordering, which would seem to make the API easier to misuse / opens opportunities for bugs like not closing and re-opening the transaction or not closing the transaction prior to moving on.

/// Code to do some SPI stuff then wait for 100ms
/// note that the missing `drop` means the delay will occur within the CS assertion rather than between CS assertions
/// which is not super clear from the code
fn something(&mut self) {
  let t = spi.transaction();
  // do some SPI stuff

  // Delay for 100ms to let the device get ready for the next operation
  delay_ms(100);
}

i prefer the closure because it make it obvious when the end of the transaction is, rather than whenever the compiler chooses to drop the object. the timing of when a mutex is dropped / reopened doesn't typically matter because it's just a software concept, but CS assertion and deassertion very much does.

Dirbaio commented 2 years ago

with this approach the CS is effectively scoped to the parent function rather than the transaction and requires explicit overriding with manual drop to maintain ordering, which would seem to make the API easier to misuse / opens opportunities for bugs like not closing and re-opening the transaction or not closing the transaction prior to moving on.

They're scoped to the parent block, not function. You can also add explicit {} braces to force dropping earlier, instead of drop(x). Also, many times the block scope naturally aligns with what you want to do, for example drivers often have read_reg, write_reg fns that do just that, so the transaction gets dropped at the right place.

While I like the idea of closing the transaction when falling out of scope, do you have any plan on how to inform the driver/what to do about a CS pin deasertion error on drop() (if not using a done(mut self) method like @ryankurte suggested)?

Oh, this is a problem indeed. done would work, but there's no way to enforce the user calls it, so you still have to do something on drop, probably panicking. Not great. Maybe it's not that bad because fallible GPIO is very uncommon...? Still, I don't like that there are two ways to accomplish the same thing (done and dropping) with non-obvious pros and cons.

Perhaps the closure is better after all. The analogy with Mutex/RefCell breaks down because dropping Transaction causes I/O side effects which are fallible, while dropping Mutex/RefCell guards doesn't...

Any reason why a similar approach would not work for I2C?

It'd work too, I guess? Same for the closure version. i2c has the same problems outlined in #299 so the device/bus split would help a lot there too. Same for the closure version if we decide to go with that. It'd be nice if i2c+spi transactions were consistent. i2c is a bit more picky with the repeated-start stuff etc, that might make it more complicated.

Ensure that implementations for &mut T do not introduce problems like those highlighted in #350

I would not do impl SpiBus for T where T: SpiDevice impls (which is the logical equivalent of the ones conflicting in #350). If you have an SpiDevice, you have to start a transaction to use it. Reasons:

It's conceptually wrong: Devices are not buses. If they were, you could give a device to a bus mutexer as if it was a bus, and have it mutexify that. You end up with two layers of mutexing, with the "inner" CS being auto-toggled at each read/write op, which makes no sense whatsoever.
It's relatively common to do multiple ops in the same transaction. Most devices are some form of "Write addr + read data, write addr+write data". Having it autostart the transaction if you want to do a single write is not that valuable IMO.
A driver might bound on SpiBus and do their own manual CS management. By doing so, it declares it wants exclusive ownership of the whole bus. The driver knows its manual CS management is correct thanks to owning the entire bus. If we had these impls, you'd be able to feed it an SpiDevice on a shared bus, which could cause transactions from other drivers to get interleaved in between without the driver knowing. This is the issue at https://github.com/Rahix/shared-bus/issues/23 .
The impls do conflict as well, hehe :see_no_evil:

Ensure it is not possible to call transaction twice before the first one is dropped.

For async, ensure it is not possible to call transaction again while a transaction is still ongoing but an operation is being awaited.

These are already enforced by the 'a lifetime in Transaction: it borrows self, so you get a borrow checker error if you try.

Wait for stable GATs :coffee:

Yep... :sleeping:

Dirbaio commented 2 years ago

Pushed a new version!

Changed to use closures.
- Removes the requirement on nightly-only GATs.
- Solves the issue of propagating errors when deasserting CS.
Removed ManagedCs trait. After thinking about it more, I think the "device" abstraction is enough, and having an additional ManagedCs trait would be counterproductive / too restrictive.
- For example: a driver for a chip with CS will require ManagedCs. However, the user might want to tie CS low on the PCB if they're not planning on sharing the bus. The CS pin is there, just not accessible from the MCU. The user would be unable to use the driver, unless doing hacks like "fake" ManagedCs impls that don't actually manage CS.
Changed SpiDevice, SpiDeviceRead, SpiDeviceWrite from traits to trait aliases. The problem with them being traits is the Device impls must explicitly impl them. We don't want that, we want them to be just a "pre-made set of trait bounds", which is exactly what trait aliases are. They're nightly-only, so perhaps it's best to not include them for now, and have drivers add bounds like this instead: where T: SpiDeviceBase, T::Bus: SpiBus.

I also wrote a working end-to-end test case here. Targets the Raspberry Pi Pico RP2040 with the Pico-ResTouch-LCD-2.8. It's the perfect use case because it has an SPI display and an SPI resistive touch sensor on the same SPI bus with separate CS pins.

The HAL impl stays nice and small, it only cares about the Bus part. It doesn't have to worry about Device stuff.
mod shared_spi impls SpiDevice on top of SpiBus+OutputPin, in a HAL-independent way. This is something that could go in crates like shared_bus.
mod touch implements the touch driver, generic on a SpiDevice. This shows how a small simple driver would look like. The code is simplified thanks to Device dealing with CS for us. It also shows doing multiple transfers in the same transaction with the new API (this would be done with the Transactional trait before, which this PR removes because it's no longer needed.)
mod my_display_interface contains the display interface impl, generic on a SpiDeviceWrite. Of note, this is a use case of the write-only SPI traits.
the main code ties everything together: Creates the SPI bus with the HAL, creates the two Devices with their CSs, and gives them to the drivers.

It all seems to work nicely together! :D

Open questions:

Do we want to add a "CS" variant to ErrorKind?
Do we want to add pre-made SpiDevice impls? Probably yes, at least for the "exclusive" case (no sharing), with and without CS.

ryankurte commented 2 years ago

approach looks pretty good to me! though still a couple of questions...

i don't think the device vs bus nomenclature is super clear, i think the biggest complaint is that we're going to end up with situations where we're talking about (a driver for) an spi device (eg. a radio) with bounds on SpiDevice.

i think we can resolve this with better module and per-function documentation wrt. the SpiDeviceX and SpiBusX functions to describe their purpose / use, and i'd be in favour of s/SpiDevice/Spi/g in the trait names since that's more consistent with other e-h traits and what most people will see.

Changed SpiDevice, SpiDeviceRead, SpiDeviceWrite from traits to trait aliases. The problem with them being traits is the Device impls must explicitly impl them.

agreed that (for now at least) we're going to need to drop the trait aliases... we could replicate this behaviour with marker traits and bounded default implementations, though they conflict with &mut T impls and i'm not sure whether it would be breaking to swap to aliases later. again i think documentation is probably the solution (show people exactly how to use em).

For example: a driver for a chip with CS will require ManagedCs. However, the user might want to tie CS low on the PCB if they're not planning on sharing the bus. The CS pin is there, just not accessible from the MCU. The user would be unable to use the driver, unless doing hacks like "fake" ManagedCs impls that don't actually manage CS.

i am pretty sure this is still necessary. the idea of ManagedCS is for drivers (hal consumers) to signal that they expect CS to be managed at a level below them, the only way to write that's abstract over software or hardware CS is to push that requirement out of the driver.

Dirbaio commented 2 years ago

i think we can resolve this with better module and per-function documentation wrt. the SpiDeviceX and SpiBusX functions to describe their purpose / use

Agreed, will add more docs.

and i'd be in favour of s/SpiDevice/Spi/g in the trait names since that's more consistent with other e-h traits and what most people will see.

I think "SpiDevice, SpiBus" is the clearest. At least clearer than "Spi, SpiBus". What's "a SPI"?

If we do the Device/Bus thing for i2c as well, then there's no more consistency issues: the "bus" concept only applies to i2c and spi.

I don't agree SpiDevice is what most people will use:

HALs will impl SpiBus, and rarely touch SpiDevice.
There's still use cases for drivers using SpiBus (see below).

agreed that (for now at least) we're going to need to drop the trait aliases... we could replicate this behaviour with marker traits and bounded default implementations, though they conflict with &mut T impls and i'm not sure whether it would be breaking to swap to aliases later. again i think documentation is probably the solution (show people exactly how to use em).

Blanket impls don't work, yeah... I'll remove them, then add more docs.

i am pretty sure this is still necessary. the idea of ManagedCS is for drivers (hal consumers) to signal that they expect CS to be managed at a level below them, the only way to write that's abstract over software or hardware CS is to push that requirement out of the driver.

I would argue SpiDevice already very strongly implies there is a CS pin. A SpiDevice impl on a shared bus without a CS pin is broken. Perhaps we can simply document that SpiDevice MUST have a CS pin? For edge cases like "there's a CS but it's tied down in the PCB" the user can impl their own as an escape hatch, I don't think it's that common..?

Drivers that want an SpiDevice intentionally without CS shouldn't be a thing. These kinds of drivers should use SpiBus instead:

Drivers that manually manage CS. Manually managing CS on a shared bus is impossible to do correctly.
Drivers for chips that don't have a CS at all (do these exist?). It's impossible to put them on a shared bus at all.
Drivers that want to abuse SPI for "cursed" stuff like WS2812B waveform generation, weird bitbangings, logic analyzers (sampling the MISO pin at a fixed interval fast)

ryankurte commented 2 years ago

I don't agree SpiDevice is what most people will use:

sure, but the number of HALs << the number of drivers << the number of applications. HAL authors are already deep in the technical side, and i would expect orders of magnitude more drivers and applications, so it seems worth weighting our solutions towards that end.

I would argue SpiDevice already very strongly implies there is a CS pin. A SpiDevice impl on a shared bus without a CS pin is broken. Perhaps we can simply document that SpiDevice MUST have a CS pin? For edge cases like "there's a CS but it's tied down in the PCB" the user can impl their own as an escape hatch, I don't think it's that common..?

i think we are looking at this from the opposite direction. so your expectation is that HALs will provide Spi::with_sw_cs and Spi::with_hw_cs(..., pin: Pin) functions so they cannot be constructed without some CS control?

because in many cases you will choose whether to use the hw implementation based on the functionality you require, so can't be required to use the hw approach. and building on that what happens if you had a cursed situation like CS via GPIO expander, how would you model this?

Drivers that manually manage CS. Manually managing CS on a shared bus is impossible to do correctly.

this is the whole point of SpiDevice and transaction... once you're in the closure you have exclusive access to the bus to do what you will with CS (or any other IOs)?

Dirbaio commented 2 years ago

Pushed new version

Removed trait aliases
Renamed SpiDeviceBase to SpiDevice
Added ExclusiveDevice. Impls a SpiDevice out of a SpiBus and a OutputPin, with no bus sharing.
Added docs.

@ryankurte hopefully the added docs clarifies the confusion. The TLDR is:

SpiBus = exclusive ownership of the whole bus. NEVER manages CS. Implemented in HALs
SpiDevice = ownership of a single device in a possibly-shared bus. ALWAYS manages CS. Implemented in either:
- Using Hardware CS, in the HALs.
- Using software CS, in either
  - embedded-hal, the ExclusiveDevice
  - Third-party crates, like shared-bus.

Since SpiDevice ALWAYS manages CS, there's no need for a ManagedCs marker trait. There's no such thing as a "SpiDevice without CS", you just use SpiBus instead.

Dirbaio commented 2 years ago

ugh, MSRV is 1.46+ but #[doc = include_str!(...)] is 1.54+ ... What do we do?

ryankurte commented 2 years ago

Since SpiDevice ALWAYS manages CS, there's no need for a ManagedCs marker trait. There's no such thing as a "SpiDevice without CS", you just use SpiBus instead.

alright i think that's clearer. and if your hardware does CS but you don't use it you just ignore it?

ugh, MSRV is 1.46+ but #[doc = include_str!(...)] is 1.54+ ... What do we do?

would it work to chuck the images here then link to them instead of embedding for now?

Dirbaio commented 2 years ago

alright i think that's clearer. and if your hardware does CS but you don't use it you just ignore it?

Yep, The HAL would only enable hardware-CS if you ask for it. You can just use the SpiBus part.

would it work to chuck the images here then link to them instead of embedding for now?

:+1:

Dirbaio commented 2 years ago

Pushed new version:

Link image from github (note: it'll be broken until merged because it points to git master... maybe there's a better way?)
Added changelog.

Updated embassy-nrf, embassy-stm32, embassy-rp here, no issues found.

Dirbaio commented 2 years ago

Pushed new version:

Applied suggestions from review
Added ErrorKind::ChipSelectFault
Bumped MSRV to 1.54
Changed the svg back to include_str!

A thing to clarify would be the timing behavior (https://github.com/rust-embedded/embedded-hal/issues/264) as we discussed. Somehow SPIDevice implementations must be able to ensure that CS is only deasserted after SCK is idle.

Agreed this is needed for correct SpiDevice operation. I was planning on sending a follow-up PR, as this PR is quite big already. (probably adding a .flush() method, which AFAICT should work fine with the SpiDevice design)

Dirbaio commented 2 years ago

Pushed new version.

Changed examples to use error handling best practices (error enum instead of unwrap).
Changed transaction closure from FnOnce(&mut Self::Bus) -> R to FnOnce(&mut Self::Bus) -> Result<R, <Self::Bus as ErrorType>::Error>.

The reason for the closure change is that SpiDevice::Error and SpiDevice::Bus::Error are different types (and they can't be the same type because SpiDevice has to be able to add the CS errors). This forces drivers to deal with both errors, which is quite ugly:

enum Enc28j60Error<SPI>
where
    SPI: SpiDevice,
    SPI::Bus: SpiBus,
{
    SpiDeviceError(SPI::Error),
    SpiBusError(<SPI::Bus as ErrorType>::Error),
}

struct Enc28j60<SPI> {
    spi: SPI,
}

impl<SPI> Enc28j60<SPI>
where
    SPI: SpiDevice,
    SPI::Bus: SpiBus,
{
    pub fn do_it(&mut self) -> Result<(), Enc28j60Error<SPI>> {
        self.spi
            .transaction(|bus| bus.write(&[0xFF]))
            .map_err(Enc28j60Error::SpiDeviceError)?
            .map_err(Enc28j60Error::SpiBusError)?;
        Ok(())
    }
}

WIth the new API, the SpiDevice impl converts the SpiDevice::Bus::Error into a SpiDevice::Error, so drivers only have to deal with the SpiDevice::Error. The result looks like this, which is much cleaner IMO.


enum Enc28j60Error<SPI> {
    Spi(SPI),
}

struct Enc28j60<SPI> {
    spi: SPI,
}

impl<SPI> Enc28j60<SPI>
where
    SPI: SpiDevice,
    SPI::Bus: SpiBus,
{
    pub fn do_it(&mut self) -> Result<(), Enc28j60Error<SPI::Error>> {
        self.spi.transaction(|bus| bus.write(&[0xFF])).map_err(Enc28j60Error::Spi)?;
        Ok(())
    }
}

The downside is the new way requires nested Results if the user wants to return their own errors from within the transaction. However, that should be quite rare, and the previous way would always return a nested error from transaction in practice ALL the time, so I think it's a net win.

Dirbaio commented 2 years ago

New version

Add changelog entries to Unreleased instead of to alpha.7 (oops!)

Rahix commented 2 years ago

As mentioned in an inline comment above, I do not think the bus-sharing code for hardware chip-select support should be implemented in the HALs. I fear this will quickly lead to compatibility problems because bus-sharing always requires some sort of synchronization primitive and a HAL has no idea what synchronization methods an application would prefer to use.

I've been thinking about this whole story for quite some time and the most promising idea I came up with is this: HALs should provide concrete APIs on their SpiBus-implementing type for configuring the active hardware-CS line. The bus-sharing solution (for example, ugh, shared-bus) then provides a way to call a small function after locking but before operating on the bus. This function is user-provided and configures the bus for the impending operation. To let some code speak:

// this is what application code would look like:
// ---------------------------------------------
let spi_bus = BusManager::new(SomeSpiBus::new());

let dev_a: impl SpiDevice = spi_bus.acquire_with_hwcs(|b| {
    // set_active_cs() just marks which CS pin should be asserted when a transfer is started
    b.set_active_cs(1);
});
let dev_b: impl SpiDevice = spi_bus.acquire_with_hwcs(|b| {
    b.set_active_cs(2);
});

// acquire_with_hwcs() is defined like this:
// ----------------------------------------
impl<BUS> BusManager<BUS> {
    fn acquire_with_hwcs<'a>(&'a self, cfg: fn(BUS)) -> SharedSpiDevice<'a, BUS, NoCs> { ... }

    // ofc. there is a variant with GPIO chip-select
    fn acquire<'a, CS: OutputPin>(
        &'a self, cs: CS, cfg: fn(BUS)
    ) -> SharedSpiDevice<'a, BUS, CS> { ... }
}

// and the function is called like this:
// ------------------------------------
impl<'a, BUS> SpiDevice<BUS> for SharedSpiDevice<'a, BUS> {
    fn transaction<R>(&mut self, f: ...) -> Result<R, Self::Error> {
        self.bus.lock(|bus: &mut BUS| {
            // do note that CS can be a no-op implementation for acquire_with_hwcs()
            self.cs.set_low().map_err(ExclusiveDeviceError::Cs)?;

            //     |||
            //     VVV
            self.cfg(bus);

            let f_res = f(bus);
            let cs_res = self.cs.set_high();
            let f_res = f_res.map_err(ExclusiveDeviceError::Spi)?;
            cs_res.map_err(ExclusiveDeviceError::Cs)?;

            Ok(f_res)
        })
    }
}

This design pattern additionally provides a solution to a problem which was not discussed here at all so far: How do you talk to multiple devices which need different bus configurations (SPI mode, clock speed)? With this design it is simple:

// application code:
let spi_bus = BusManager::new(SomeSpiBus::new());

let cs_flash = gpio.P42.into_output();
let flash = SpiFlashDriver::new(spi_bus.acquire(cs_flash, |b| {
    b.set_speed(40.MHz());
    b.set_mode(spi::MODE_0);
}));

let cs_sensor = gpio.P23.into_output();
let sensor = SensorDriver::new(spi_bus.acquire(cs_sensor, |b| {
    b.set_speed(1.MHz());
    b.set_mode(spi::MODE_3);
}));

Dirbaio commented 2 years ago

HALs should provide concrete APIs on their SpiBus-implementing type for configuring the active hardware-CS line.

One challenge with HW CS is you can do multiple transfers in the same transaction:

  self.spi.transaction(|bus| {
        bus.write(&[addr])?;
        bus.read(&mut buf)?;
    })?;

So, we can't simply have HALs add a set_active_cs, and then have them assert/deassert CS for each read/write/transfer. That'll assert CS 2 times (for the write and for the read) in the above case, not 1 time.

The trait would have to look something like this. Implementing it at the HAL side would require fun tracking state so CS gets enabled/disabled at the right times...

trait SpiBusWithHardwareCs {
    fn set_active_cs(&mut self, cs: ???);
    fn start_transaction(&mut self);
    fn end_transaction(&mut self);
}

Is all this complexity really worth it? Are performance gains from hardware-CS really that big? And how common is it to care so much about performance that you REALLY want hardware-CS AND you still want to share the bus? Shared buses are bad for performance!

With the current design, HALs impl SpiBus for sharing with software-CS. They also have the option to impl SpiDevice without bus sharing with hardware CS (which I predict will be rare). If someone really REALLY wants hardware-cs AND sharing (which I predict will be ULTRA rare) then yes, they'll have to impl it all in the HAL, replicating the mutex logic for their particular environment.

I don't think HAL-independent hardware-CS bus sharing is worth the extra complexity to add support for that in embedded-hal.

Dirbaio commented 2 years ago

The "configure bus" closure is super neat! That's a valid SpiDevice impl (allowed by the trait contract) indeed.

I think it's best to leave the issue of configuring the bus to external crates, so we don't lock ourselves to a particular solution. One such solution is HALs providing set_freq, set_mode methods and shared-bus providing the "configure bus" closure as you say!

In the future we could add traits for set_freq, set_mode so SpiDevice impls can do that in a HAL-independent way, but I don't think it's needed now.

Dirbaio commented 2 years ago

New version

Added helper methods to SpiDevice to do transactions with single transfers

Dirbaio commented 2 years ago

New version

Make SpiDevice helper methods generic over word type. (Thanks @GrantM11235, very nice catch! :D)

bors[bot] commented 2 years ago

Build succeeded: