jonas-schievink / rubble

(going to be a) BLE stack for embedded Rust
BSD Zero Clause License
396 stars 56 forks source link

Demo doesn't work with logging disabled #94

Closed thalesfragoso closed 4 years ago

thalesfragoso commented 4 years ago

Current Behavior

~When running the nrf52-demo compiled with debug-assertions = false the connection fails and after that the device goes to standby, the logs also don't work. That seems strange because I didn't experience any panics with (or without) debug assertions, I set up a blinky task and it keeps blinking away with no problems.~

Expected Behavior

~The demo (and rubble) should work without debug assertions (i.e. on release mode)~

The demo (and rubble) should work with logging disabled

Steps to reproduce

~Add debug-assertions = false to profile.dev or cargo build --release. For the later one can also change the "release_max_level_warn" on the demo's Cargo.toml to see that debug-assertions=false also breaks the logging.~

Environment

rustc 1.39.0 on Linux, code tested on a nRF52832 board.

thalesfragoso commented 4 years ago

I found out what was the problem with the logging. The demo-utils library also sets the features = ["release_max_level_warn"] which isn't recommend because it can only be set once. According to the log docs:

Libraries should avoid using the max level features because they're global and can't be changed once they're set.

See also: https://github.com/rust-lang/log/issues/309

If we remove those features on demo-utils and in nrf52-demo the logs work on release and the demo works as expected with a successful connection, but with logging disabled the connection fails both in debug and release.

Now I think the issue is: the demo should work with logging disabled.

thalesfragoso commented 4 years ago

Issue updated with more recent discoveries. It might be caused by some timing problems.

jonas-schievink commented 4 years ago

This seems to be caused by the two logging statements in the connection event code path (the 2 that always fire for every single packet exchange). Commenting them out reproduces the issue even with logging enabled:

https://github.com/jonas-schievink/rubble/blob/2479b2bd1ec3cd77c88e7630bd2a0c7706348642/rubble/src/link/connection.rs#L299-L307

https://github.com/jonas-schievink/rubble/blob/2479b2bd1ec3cd77c88e7630bd2a0c7706348642/rubble/src/link/connection.rs#L400

EDIT: The first one is enough on its own, so that one causes the problem.

thalesfragoso commented 4 years ago

I did a disassemble of the code with and without that trace!, but my asm-fu isn't good enough for this. https://gist.github.com/thalesfragoso/8610284f3084ed6b1f80f81e01e0d591

They look really similar, I only found one thing that I couldn't really understand, looks like that the code with the trace! has an extra branching here: https://gist.github.com/thalesfragoso/8610284f3084ed6b1f80f81e01e0d591#file-disas-trace-txt-L1306

But like I said, I'm not really good at this.

jonas-schievink commented 4 years ago

Hmm, is that from objdump? I've seen it mess up jump targets fairly often, so that might be a fluke (the branch doesn't really make sense if it really comes from inside log). Maybe try cargo-asm or llvm-objdump instead, and perhaps diff the result so it's easier to read?

(I doubt that the issue can be found like that though, it's more likely some issue in the nrf52 radio driver code)

thalesfragoso commented 4 years ago

It's from gdb, I tried cargo-asm but it was making very difficult to find the method. I diffed the files locally on my editor, but yeah, I should have uploaded as a diff.

(I doubt that the issue can be found like that though, it's more likely some issue in the nrf52 radio driver code)

What makes you think that ?

jonas-schievink commented 4 years ago

The radio driver has caused issues like this before (and it still has the DMA issue that you rightly pointed out)

thalesfragoso commented 4 years ago

You were right, the problem was in the radio driver. The configure_receiver method disables the radio without checking if the DMA finished transmitting the data started in transmit_data. I did some fixes and managed to get it working in all cases, and it looks like that the overall reliability also increased.

Tomorrow I will clean up the code and submit a PR.