Open GrantM11235 opened 8 months ago
If I understand it correctly, the CS assertion on hardware somewhat resembles a lock in software on the SPI device, so if the CS deassert fails, the software lock should not be unlocked either or at least let this not happen:
- Thread 2 does a transaction to device B while the cs for device A is still active
If and how to recover the situation, I'm not sure about. Panicking should be avoided. Reporting the chip select error can be useful. The user is at least able to shut down the SPI to avoid any further damage to the hardware (if at risk).
i'm not super convinced making CS pins infallible / wrapping in panics is the right way to handle this... i can't think of any situations with fallible IOs in which you're likely to be able to -recover- from a CS failure, particularly because if a write fails you're not going to be able to tell what state the pin is in, so you'd need to basically tear down and re-init the bus.
the problem with wrapping in a panic is that it stops the error handling path that would let you at least tear things down neatly, it seems to me like a CS failure should poison the whole shared bus (within the critical section).
i'm not super convinced making CS pins infallible / wrapping in panics is the right way to handle this
It's not always the right way to handle it, but I think it is the best way to handle it for the vast majority of cases.
I think #574 strikes a good balance. It is useful to most people, has no unnecessary overhead, and is honest about it's limitations.
Anyone who does need to do more complex error handling is free to write their own SpiDevice
impl. After all, there is nothing special about the impls in embedded-hal-bus, they are just some utilities that we think will be useful to other people.
If there is enough interest, we could even add another set of SpiDevice
impls to embedded-hal-bus that do poisoning, but I don't think enough people would use it to justify the mild maintenance burden.
This affects all the spi sharing utilities in some way, but I would like to highlight the worst case scenario:
Option 1: Infallible only
CS: OutputPin<Error = Infallible>
This is the simplest option. In fact it is even simpler than the existing impls because it allows us to remove the
DeviceError
enum.You can still use fallible chipselect pins if you wrap them in some sort of adapter that panics on failure. We could even provide such an adapter.
Option 2: track "poisoned" chipselects
I have some idea how to implement this, it would involve adding a shared flag to each bus, as well as another flag for each
SpiDevice
. The impl would refuse to do any spi transactions until the offending chipselect is fixed.This would be a lot of extra overhead and complexity that is completely unnecessary for 99.9% of users. I don't think the compiler will be able to remove the unnecessary overhead.
Option 3: do nothing
Document the broken error handling, how to work around it, and where it is impossible to work around.
If we want to be thorough, the amount of warnings and disclaimers will probably outweigh the amount of actual code:
expand me!
- how to detect a failed chipselect - `DeviceError::Cs`, obviously, but also - `DeviceError::Spi`, because it can "hide" chipselect errors - basically any `SpiDevice` error - downstream errors such as `DisplayError::BusWriteError` from `display_interface` (but ironically not `DisplayError::CSError`) - how to properly recover from a chipselect error - **before** using any device on the bus: - get access to the `SpiDevice` by destroying whichever driver was using it - get access to the chipselect by destroying the `SpiDevice` - do whatever you need to do to fix and deassert the chipselect pin - in multitheaded/concurrent code, it is impossible to ensure that this is done before another thread uses a device on the bus. It is not possible to correctly handle chipselect failure in this case - what happens if you use a device on the bus before fixing the chipselect error - this is even a minor problem for `ExclusiveDevice` because the framing will be messed up - if two chipselects are active at once - the other device will receive garbage data - both devices will fight to drive MISO, causing data corruption and possibly even physical damageConclusions
If you only have infallible chipselect pins, or if you want to panic on chipselect failure, option 1 is the best.
If you actually want to try to gracefully recover from chipselect errors, option 2 is the only real option.
Option 3 isn't good for either group. It is slightly inconvenient for the first group, and it is almost impossible to use correctly for the second group
My recommendation is option 1 because it is the best option for most people. If someone really needs option 2, they can write it for themself, it doesn't need to be in our crate.