Open jannic opened 3 months ago
Noting my comments from chat:
Flaky behavior like this is generally from one of three sources:
nostd_async
, or within rp-hal
: https://github.com/sammhicks/nostd_async/issues/4The unsoundnesses raised in https://github.com/sammhicks/nostd_async/issues/4 while true, are not exercised in these examples.
They all stem from the fact that if tasks are stack allocated, they may be moved which would break the internally kept pointer to them.
In this tests/examples, the tests run and block that main thread until the task has completed. By then, there’s no longer any pointer to the tasks kept anywhere (neither in nostd_async, nor in any Waker as there are no IRQ expected).
@jamesmunns I don’t quite understand your point 2, so for this answer I’ll assume you meant
in
async_utils
a race is possible if the interrupt fires after line 121 returnedPending
but before the waker is even registered.
But the interrupts are expected to be masked (within the peripheral, but enabled in the NVIC), therefore the IRQ isn’t fired until (setup_flags)(periph)
is called on line 124 after the waker has been registered.
There may be a need for more detailed documentation to explain the expected behaviour & responsibility, eg:
@ithinuel I was more pointing to this part from dirbaio's comments:
cargo miri test and cargo miri run --example ... fail. I haven't looked deep in the code to know whether this is due to the above issues, or to other issues.
It's possible there is additional latent UB. Also UB doesn't necessarily need to be exercised to cause miscompilations.
You were correct at my second point, I'd definitely need to audit in more detail to come up with a specific indication!
Lacking any better idea, I did some bisecting: nightly-2024-06-13 is the first version where this happens, with nightly-2024-06-12 the on-target-tests succeed.
Hi, I'm the developer of nostd_async
.
There was definitely UB in nostd_async
, which should now have been resolved.
Miri complains about version 0.6.1
, but not 0.7.0
to 0.7.2
.
However, I'm no longer developing it, so you may wish to use a different async runtime, such as embassy
.
Does diffing the objdump turn up anything interesting? I like to use arm-none-eabi-objdump --no-addresses --no-show-raw-insn -dCt ./file.elf
.
I tried this several times because I couldn't believe it, and it seems to be reproducible:
The command
CARGO_TARGET_THUMBV6M_NONE_EABI_RUNNER="probe-rs run" cargo +beta test -p on-target-tests --no-fail-fast --test i2c_loopback --test i2c_loopback_async -- --chip rp2040
just hangs on the second test (i2c_loopback_async).On stable, it works reliably:
CARGO_TARGET_THUMBV6M_NONE_EABI_RUNNER="probe-rs run" cargo +stable test -p on-target-tests --no-fail-fast --test i2c_loopback --test i2c_loopback_async -- --chip rp2040
.If I only run i2c_loopback_async without running i2c_loopback first, it works on both stable and beta.
The hang vanishes if I add 4
cortex_m::asm::nop()
calls innon_blocking::reset
after the twofree
calls. 3 are not enough. Acortex_m::asm::delay(1000)
in the same place neither so it's not purely a timing issue. More likely depends on the code size somehow?Changing the
i2c_loopback
test (ie. the one I have to call first to make the second one fail) to only contain theembedded_hal
test causes the call toi2c_loopback_async
to return with:Tested on a raspberry pico with the documented test connections, pico-probe, probe-rs 0.24.0 on linux. Rust versions are 1.80.0 and 1.81.0-beta.2.