rtic-rs / rtic

Real-Time Interrupt-driven Concurrency (RTIC) framework for ARM Cortex-M microcontrollers
https://rtic.rs
Apache License 2.0
1.74k stars 198 forks source link

rtic-monotonic panics #956

Open andresv opened 2 months ago

andresv commented 2 months ago

I managed to get monotonic panic. TIM3 on stm32g030k6 should be 16bit timer.

embassy-stm32 = { git = "https://github.com/embassy-rs/embassy.git", rev = "000b022ae2e52e9abaabbd10110b4c583fe4344c", features = [
    "defmt",
    "stm32g030k6",
    "unstable-pac",
    "exti",
] }
rtic-monotonics = { version = "2.0.1", features = [
    "stm32g030k6",
    "stm32_tim3",
] }
rtic-sync = "1.3"
rtic = { version = "2.1", features = ["thumbv6-backend"] }
15439 INFO  periodic
└─ ptubasehub::app::input_events::{async_fn#0} @ src/main.rs:176
33074 INFO  periodic
└─ ptubasehub::app::input_events::{async_fn#0} @ src/main.rs:176
32779 ERROR panicked at /Users/andres/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rtic-monotonics-2.0.1/src/stm32.rs:338:1:
Monotonic must have missed an interrupt!
└─ panic_probe::print_defmt::print @ /Users/andres/.cargo/registry/src/index.crates.io-6f17d22bba15001f/panic-probe-0.3.1/src/lib.rs:104
Finomnis commented 2 months ago

Is this reliable? How long does it take to happen, does it happen immediately or after a while? Does something trigger it?

This error happens if the interrupt of a monotonic got blocked for more than half a timer period, indicating that the timer lost track of time. Are you blocking interrupts, for example with cortex_m::interrupt::free?

andresv commented 2 months ago

It took minutes or more to happen. I used ADC and BufferedUart from embassy-stm32 HAL + blocking onewire.

I needed to get it ready quickly, because of the second issue https://github.com/rtic-rs/rtic/issues/957 I decided to move it to embassy executor.

Finomnis commented 2 months ago

@korken89

Still not convinced that this is related, though.

It took minutes or more to happen.

Most stm32 timers are 16 bit, and if running with 1MHz they generate one interrupt every 32ms. If the half overflow cc would be completely inactive, the first error would happen after roughly 64 ms, reproducible. Which does not seem to be the case, if I understand this correctly.

Finomnis commented 2 months ago

Either way, @andresv, if you continue to use rtic-monotonics, please monitor this and report back if it happens again. I suspect there is a different reason.

Edit: I was informed this timer might be 32 bits. In that case, this would make sense.

andresv commented 2 months ago

STM32G030K6T6 only has 16bit timers: Screenshot 2024-07-07 at 13 16 31

andresv commented 2 months ago

It ran hours with rtic-monotonic 2.0.2 and then

10 ERROR panicked at /Users/andres/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rtic-monotonics-2.0.2/src/stm32.rs:338:1:
Monotonic must have missed an interrupt!
└─ panic_probe::print_defmt::print @ /Users/andres/.cargo/registry/src/index.crates.io-6f17d22bba15001f/panic-probe-0.3.1/src/lib.rs:104

I'll check if I can strip it down a little to make src available.

Finomnis commented 2 months ago

Yeah, I was afraid that would happen. @korken89 can you reopen?

My primary suspect right now is an incompatibility with another library. Rtic does not support critical sections that simply disable interrupts completely, and this is what might be happening here.

Timer interrupts currently run at the highest possible priority (to my knowledge), so for something to block an interrupt for more than 64 ms, something must block that interrupt entirely.

Either way, there's a good chance that this is not a bug in the monotonic, and the panic is correct to go off because a sanity check is actually failing. And the actual mistake is elsewhere.

Finomnis commented 2 months ago

What this panic indicates is that the monotonic realizes it is not in the state it expected to be in, so it is in an unknown state and is most certainly reporting an incorrect time.

The only way this can happen is if either the half way or the rollover interrupt was fired twice in a row without the other interrupt firing. Which puts the monotonic in an undefined state.

Finomnis commented 2 months ago

Of course I don't rule out a programming error on our side, but so far I didn't find one.