stm32-rs / stm32f4xx-hal

A Rust embedded-hal HAL for all MCUs in the STM32 F4 family
BSD Zero Clause License
534 stars 206 forks source link

Recent updates appear to cause SPI communication lockup #707

Open ryan-summers opened 7 months ago

ryan-summers commented 7 months ago

In booster, using the latest master causes our application to lock up and encounter watchdog resets. Backtraces consistently have pointed to some of the SPI functions as the root cause, and using older revisions of the f4xx-hal do not have these issues.

Below is the backtrace when I Ctrl+C during a lockup of our idle() task.

   0: stm32f4xx_hal::spi::Inner<SPI>::check_read
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\4144b1d\src\spi.rs:764:16
   1: stm32f4xx_hal::spi::Spi<SPI,_,W>::read_nonblocking
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\4144b1d\src\spi.rs:932:9
   2: stm32f4xx_hal::spi::Spi<SPI,_,W>::transfer_in_place
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\4144b1d\src\spi.rs:945:32
   3: stm32f4xx_hal::spi::hal_02::blocking::<impl embedded_hal::blocking::spi::Transfer<u8> for stm32f4xx_hal::spi::Spi<SPI>>::transfer
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\4144b1d\src\spi\hal_02.rs:61:13
   4: <w5500::bus::four_wire::FourWire<Spi,ChipSelect> as w5500::bus::Bus>::read_frame::{{closure}}
        at C:\Users\rsummers\.cargo\registry\src\index.crates.io-6f17d22bba15001f\w5500-0.4.1\src\bus\four_wire.rs:40:13
   5: <w5500::bus::four_wire::FourWire<Spi,ChipSelect> as w5500::bus::Bus>::read_frame
        at C:\Users\rsummers\.cargo\registry\src\index.crates.io-6f17d22bba15001f\w5500-0.4.1\src\bus\four_wire.rs:35:22
   6: w5500::socket::Socket::get_receive_size
        at C:\Users\rsummers\.cargo\registry\src\index.crates.io-6f17d22bba15001f\w5500-0.4.1\src\socket.rs:186:13
   7: w5500::raw_device::RawDevice<SpiBus>::read_frame
   8: booster::hardware::external_mac::<impl smoltcp::phy::Device for booster::hardware::Mac>::receive

If I revert to 2d8d882, this issue goes away. I'm not sure what's changed since then

burrbull commented 7 months ago

Possibly something in #673 Try https://github.com/stm32-rs/stm32f4xx-hal/commit/4c16f3d1fb415d3f3399e595d6f2936060f6a21b, please.

ryan-summers commented 7 months ago

Hmm. There's something else going on, as master now appears to work just fine. This is likely something related to hardware or register configurations, sorry for the false-positive here. If I keep seeing oddness, I'll reopen.

ryan-summers commented 7 months ago

Ah intriguing. The issue appears to be caused by a difference between Release and Debug configurations. Debug is working fine, but Release is not.

I have returned with some more testing results:

I retried 4c16f3d1fb415d3f3399e595d6f2936060f6a21b with our main branch and confirmed that this appears to be the first commit where this defect appears

burrbull commented 7 months ago

Try https://github.com/stm32-rs/stm32f4xx-hal/tree/spi_revert. It reverts check_read & check_send.

ryan-summers commented 7 months ago

Interestingly, that branch indeed seems to be working without issue

burrbull commented 7 months ago

Interestingly, that branch indeed seems to be working without issue

Yeah, strange. I've added asserts in that branch. Could you check does it fault on any assert?

ryan-summers commented 7 months ago

Even more strangely - the commit with the asserts does not work, but the commit immediately before does. Perhaps this is a race condition of some sort that arises because of extra processing time required by the new API, although I have not profiled things on my side extensively (I'm just looking at very high-level functionality of the app, i.e. it connects over MQTT or does not)

ryan-summers commented 7 months ago

I might have time to dig deeper later on this, but I need to head home for the day now

burrbull commented 7 months ago

@ryan-summers Try #709, please.

ryan-summers commented 7 months ago

Still not working on my end, it seems to keep hitting lockup in the following functions:

stack backtrace:
   0: stm32f4xx_hal::spi::Inner<SPI>::check_read
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\26f646d\src\spi.rs:766:19
   1: stm32f4xx_hal::spi::Spi<SPI,_,W>::read_nonblocking
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\26f646d\src\spi.rs:933:9
   2: stm32f4xx_hal::spi::Spi<SPI,_,W>::transfer_in_place
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\26f646d\src\spi.rs:946:32
   3: stm32f4xx_hal::spi::hal_02::blocking::<impl embedded_hal::blocking::spi::Transfer<u8> for stm32f4xx_hal::spi::Spi<SPI>>::transfer
        at C:\Users\rsummers\.cargo\git\checkouts\stm32f4xx-hal-fe8350cc04cacf3f\26f646d\src\spi\hal_02.rs:61:13
   4: <w5500::bus::four_wire::FourWire<Spi,ChipSelect> as w5500::bus::Bus>::read_frame::{{closure}}
burrbull commented 7 months ago

It is strange. I was sure spi_revert and flags-test2 produce same code. 8 byte different

burrbull commented 7 months ago

I've merged revert commits.