rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.25k stars 12.57k forks source link

Rust runs 2fast2furious on the RPico? #118330

Closed JonasFocke01 closed 9 months ago

JonasFocke01 commented 10 months ago

Hello,

I encountered an error, where my code behaves differently in debug and release mode and i was not able to fix this with any black_box tricks. This can be, that im doing something wrong, but i banged my head at walls for hours not acheaving anyting.
My case might be specific, but its rust non the less. Im programming a rpi pico and want to drive a tm1637 chip with it. Im using the tm1637 to drive a (seven segment display. I have boiled down my code to just the part that is not working in release mode. Here is the main part (without all the import stuff..):

#[entry]
#[allow(clippy::too_many_lines)]
fn main() -> ! {
    #[allow(clippy::unwrap_used)]
    let mut pac = pac::Peripherals::take().unwrap();

    let mut watchdog = hal::Watchdog::new(pac.WATCHDOG);

    watchdog.start(ExtU32::secs(1));

    #[allow(clippy::unwrap_used)]
    let clocks = hal::clocks::init_clocks_and_plls(
        rp_pico::XOSC_CRYSTAL_FREQ,
        pac.XOSC,
        pac.CLOCKS,
        pac.PLL_SYS,
        pac.PLL_USB,
        &mut pac.RESETS,
        &mut watchdog,
    )
    .ok()
    .unwrap();

    let hal_sio = hal::Sio::new(pac.SIO);

    let pins = rp_pico::Pins::new(
        pac.IO_BANK0,
        pac.PADS_BANK0,
        hal_sio.gpio_bank0,
        &mut pac.RESETS,
    );
    let core = pac::CorePeripherals::take().unwrap();
    let mut delay = cortex_m::delay::Delay::new(core.SYST, clocks.system_clock.freq().to_Hz());

    let mut clock_pin = pins.gpio6.into_push_pull_output();
    let mut data_pin = pins.gpio7.into_push_pull_output();

    let tm = TM1637::new(&mut clock_pin, &mut data_pin, &mut delay);

    tm.init(); // <-- seems to work

    loop {
        tm.clear() // <-- works in dev, but not in release mode
        tm.display(&[5]) // <-- same behavior
    }

As you noticed, the TM1637 stuff is abstracted away, because it is a crate that i used. I tinkered around with it, but i also cant find the issue there. I will not include its whole code (it could be anything), to not flood this issue, but here is a snapshot of it with the functions that could be the problem here:

    pub fn init(&mut self) -> Res<E> {
        self.start()?;
        self.send(ADDRESS_AUTO_INCREMENT_1_MODE)?;
        self.stop()?;

        Ok(())
    }

    pub fn clear(&mut self) -> Res<E> {
        self.print_raw_iter(0, core::iter::repeat(0).take(4))
    }

    pub fn print_raw(&mut self, address: u8, bytes: &[u8]) -> Res<E> {
        self.print_raw_iter(address, bytes.iter().map(|b| *b))
    }

    pub fn print_chars(&mut self, address: u8, chars: &[char]) -> Res<E> {
        self.print_raw_iter(address, chars.iter().map(|b| char_to_raw(*b)))?;
        Ok(())
    }

    pub fn print_hex(&mut self, address: u8, digits: &[u8]) -> Res<E> {
        self.print_raw_iter(
            address,
            digits.iter().map(|digit| DIGITS[(digit & 0xf) as usize]),
        )
    }

    pub fn print_raw_iter<Iter: Iterator<Item = u8>>(
        &mut self,
        address: u8,
        bytes: Iter,
    ) -> Res<E> {
        self.start()?;
        self.send(ADDRESS_COMMAND_BITS | (address & ADDRESS_COMMAND_MASK))?;
        for byte in bytes {
            self.send(byte)?;
        }
        self.stop()?;
        Ok(())
    }

    pub fn set_brightness(&mut self, level: u8) -> Res<E> {
        self.start()?;
        self.send(DISPLAY_CONTROL_BRIGHTNESS_BITS | (level & DISPLAY_CONTROL_BRIGHTNESS_MASK))?;
        self.stop()?;

        Ok(())
    }

    fn send(&mut self, byte: u8) -> Res<E> {
        let mut rest = byte;
        for _ in 0..8 {
            let bit = if rest & 1 != 0 { Bit::ONE } else { Bit::ZERO };
            self.send_bit_and_delay(bit)?;
            rest = rest >> 1;
        }

        // Wait for the ACK
        self.send_bit_and_delay(Bit::ONE)?;
        for _ in 0..255 {
            if self.dio.is_low()? {
                return Ok(());
            }
            self.delay();
        }

        Err(Error::Ack)
    }

    fn start(&mut self) -> Res<E> {
        self.send_bit_and_delay(Bit::ONE)?;
        self.dio.set_low()?;

        Ok(())
    }

    fn stop(&mut self) -> Res<E> {
        self.send_bit_and_delay(Bit::ZERO)?;
        self.dio.set_high()?;
        self.delay();

        Ok(())
    }

I encountered many errors, but i saw Infallible in release mode some times so i thought it would be best, to open an issue here. (also got pointed by reddit users).
Can you spot anything that im doing wrong that could be the problem here? Or is this really a bug in the compiler or something like that?

workingjubilee commented 10 months ago

Please include the entire main.rs file. Do minimize it as much as possible, yes, but please, include a correct and compiling version. If you wish to hide something because it is very long, please familiarize yourself with the HTML <details> element. But there are multiple 3000-line files in the compiler, and many files host comments terminating in "...I think?"

In particular, this part you gave doesn't seem correct:

    let tm = TM1637::new(&mut clock_pin, &mut data_pin, &mut delay);

    tm.init(); // <-- seems to work

    loop {
        tm.clear() // <-- works in dev, but not in release mode
        tm.display(&[5]) // <-- same behavior
    }

This Rust code would emit syntax errors, as you need a ; between tm.clear() and tm.display. In addition, those two functions take &mut, but you have declared let tm, instead of let mut tm.

There have been bugs regarding infinite loops in the past, partly because these are UB in some cases in C++, and LLVM is an optimizer which historically has been used to optimize C++ code, so there have been incidents where Rust code has been "optimized" according to C++ semantics. The code, at-a-glance, and looking past the issues I mentioned, seems like it ought to be sound, and given that, it should do what you expect it to in Rust. Thus it would be of great concern if this is indeed not working, as we might have to fix this upstream in LLVM. Of course, it might be a problem in the unsafe code in the crates you are using... this is the other reason it's very important that we do see all the imports.

I am curious if you have tried anything from core::sync::atomic, like compiler_fence, to fix this?

JonasFocke01 commented 10 months ago

Thanks for your effort!
I have tinkered around and reduced my main file. This compiles and produces the error. I tested multiple times with it.
Please let me know if something is still missing!

#![no_std]
#![no_main]

use panic_halt as _;
use rp_pico::entry;
use rp_pico::hal;
use rp_pico::hal::pac;
use rp_pico::hal::prelude::*;
use tm1637::TM1637;

#[entry]
fn main() -> ! {
    let mut pac = pac::Peripherals::take().unwrap();

    let mut watchdog = hal::Watchdog::new(pac.WATCHDOG);

    let clocks = rp2040_hal::clocks::init_clocks_and_plls(
        rp_pico::XOSC_CRYSTAL_FREQ,
        pac.XOSC,
        pac.CLOCKS,
        pac.PLL_SYS,
        pac.PLL_USB,
        &mut pac.RESETS,
        &mut watchdog,
    )
    .ok()
    .unwrap();

    let hal_sio = hal::Sio::new(pac.SIO);

    let pins = rp_pico::Pins::new(
        pac.IO_BANK0,
        pac.PADS_BANK0,
        hal_sio.gpio_bank0,
        &mut pac.RESETS,
    );

    let core = pac::CorePeripherals::take().unwrap();
    let mut delay = cortex_m::delay::Delay::new(core.SYST, clocks.system_clock.freq().to_Hz());

    let mut clock_pin = pins.gpio6.into_push_pull_output();
    let mut data_pin = pins.gpio7.into_push_pull_output();

    let mut tm = TM1637::new(&mut clock_pin, &mut data_pin, &mut delay);

    tm.init();
    tm.set_brightness(255);
    tm.clear();
    loop {
        tm.print_raw(0, &[0x77]);
    }
}
workingjubilee commented 10 months ago

Excellent, thank you! I wonder if this is somehow related to the usage of the #![no_main] feature? My intuition (read: a wild-ass guess) suggests we may need to make sure that the #[entry] isn't being given an LLVMIR annotation that would "allow" eliminating the loop, but there's many other possible gotchas in miscompilations.

workingjubilee commented 10 months ago

Objectively this should be a low-priority issue if it does in fact require the nightly feature and is in fact limited to this platform. However, that labeling is somewhat "eager": it could be possible that the underlying bug actually doesn't require that feature, and it could be a problem that is actually multiplatform. So it's worth further examination to confirm whether it is actually a higher-priority problem or not, because we've seen the much worse case of problem surface before.

JonasFocke01 commented 10 months ago

Thank you for your time!
If this is a low prio issue for you, thats totaly okay. I will try my best to check everything you mentioned.
Two things stood out for me. First you say, that the code requires nightly. Where do you see that? I currently it says, that my active rustc version is rustc version 1.74.0 installed. I cant find nightly in my project. Am i looking correct?
Second, you speak about the loop getting optimized away. I dont know if that is it, because my bigger program does everything else in the loop as it should. Nothing breaks there.

workingjubilee commented 10 months ago

Oh, the answer is I confused the #[entry] annotation with #[start] (a nightly feature) and missed the use rp_pico::entry; because I'm daft.

apiraino commented 10 months ago

@JonasFocke01 I'm reading this comment thread and I am still not exactly sure what the issue is about. What do you exactly mean by "I encountered an error, where my code behaves differently in debug and release mode". Does your code not compile in release mode?

Can you post something to clarify the issue? Also: is it something that stopped working for you after upgrading the rust toolchain? Can you help debugging the issue and check if it's a regression? You can use cargo-rustc-bisect to speed the process of finding regressions

Thanks

@rustbot label +needs-triage +E-needs-bisection

JonasFocke01 commented 10 months ago

@apiraino It compiles in Release mode, but the output if you so will is different. Im trying to drive a seven-segment-display with the tm1637 chip. Without optimizations, it shows an A with the code above, but in release mode, it shows only garbage.
I enabled different optimizations steps, but even with

[profile.release]
opt-level = 1

it wont work as expected.
It worked never, as i only got around to do this in the past week, so i dont know how the regression tool will help.

workingjubilee commented 10 months ago

Second, you speak about the loop getting optimized away. I dont know if that is it, because my bigger program does everything else in the loop as it should. Nothing breaks there.

Hmm, that's interesting. So you're saying that in the "real case", there's huge blocks of code in a loop {}, but only the writes are getting optimized away? That points more strongly to a case of the library being UB in some subtle way, or the library's internal implementation being miscompiled.

JonasFocke01 commented 10 months ago

Thats correct. And that something in that library gets compiled away was also my belief. I fought with the code once again, but i was not able to get it to work in release mode.
The whole library thankfully is rather small. I strongly suspect, that it has something to do with the send function and the waiting for the acknowledge 'bit', but i was not able to get it to work by surrounding it with an endless loop, so my assumption could also be very wrong...
Can you spot anything else in there (whole library)?

#![deny(warnings)]
#![no_std]

extern crate embedded_hal as hal;

use hal::blocking::delay::DelayUs;
use hal::digital::v2::{InputPin, OutputPin};

#[derive(Debug)]
pub enum Error<E> {
    Ack,
    IO(E),
}

impl<E> From<E> for Error<E> {
    fn from(err: E) -> Error<E> {
        Error::IO(err)
    }
}

type Res<E> = Result<(), Error<E>>;

pub struct TM1637<'a, CLK, DIO, D> {
    clk: &'a mut CLK,
    dio: &'a mut DIO,
    delay: &'a mut D,
}

enum Bit {
    ZERO,
    ONE,
}

impl<'a, CLK, DIO, D, E> TM1637<'a, CLK, DIO, D>
where
    CLK: OutputPin<Error = E>,
    DIO: InputPin<Error = E> + OutputPin<Error = E>,
    D: DelayUs<u16>,
{
    pub fn new(clk: &'a mut CLK, dio: &'a mut DIO, delay: &'a mut D) -> Self {
        Self { clk, dio, delay }
    }

    pub fn init(&mut self) -> Res<E> {
        self.start()?;
        self.send(ADDRESS_AUTO_INCREMENT_1_MODE)?;
        self.stop()?;

        Ok(())
    }

    pub fn clear(&mut self) -> Res<E> {
        self.print_raw_iter(0, core::iter::repeat(0).take(4))
    }

    pub fn print_raw(&mut self, address: u8, bytes: &[u8]) -> Res<E> {
        self.print_raw_iter(address, bytes.iter().map(|b| *b))
    }

    pub fn print_hex(&mut self, address: u8, digits: &[u8]) -> Res<E> {
        self.print_raw_iter(
            address,
            digits.iter().map(|digit| DIGITS[(digit & 0xf) as usize]),
        )
    }

    pub fn print_raw_iter<Iter: Iterator<Item = u8>>(
        &mut self,
        address: u8,
        bytes: Iter,
    ) -> Res<E> {
        self.start()?;
        self.send(ADDRESS_COMMAND_BITS | (address & ADDRESS_COMMAND_MASK))?;
        for byte in bytes {
            self.send(byte)?;
        }
        self.stop()?;
        Ok(())
    }

    pub fn set_brightness(&mut self, level: u8) -> Res<E> {
        self.start()?;
        self.send(DISPLAY_CONTROL_BRIGHTNESS_BITS | (level & DISPLAY_CONTROL_BRIGHTNESS_MASK))?;
        self.stop()?;

        Ok(())
    }

    fn send(&mut self, byte: u8) -> Res<E> {
        let mut rest = byte;
        for _ in 0..8 {
            let bit = if rest & 1 != 0 { Bit::ONE } else { Bit::ZERO };
            self.send_bit_and_delay(bit)?;
            rest = rest >> 1;
        }

        // Wait for the ACK
        self.send_bit_and_delay(Bit::ONE)?;
        for _ in 0..255 {
            if self.dio.is_low()? {
                return Ok(());
            }
            self.delay();
        }

        Err(Error::Ack)
    }

    fn start(&mut self) -> Res<E> {
        self.send_bit_and_delay(Bit::ONE)?;
        self.dio.set_low()?;

        Ok(())
    }

    fn stop(&mut self) -> Res<E> {
        self.send_bit_and_delay(Bit::ZERO)?;
        self.dio.set_high()?;
        self.delay();

        Ok(())
    }

    fn send_bit_and_delay(&mut self, value: Bit) -> Res<E> {
        self.clk.set_low()?;
        if let Bit::ONE = value {
            self.dio.set_high()?;
        } else {
            self.dio.set_low()?;
        }
        self.clk.set_high()?;
        self.delay();

        Ok(())
    }

    fn delay(&mut self) {
        self.delay.delay_us(DELAY_USECS);
    }
}

const MAX_FREQ_KHZ: u16 = 500;
const USECS_IN_MSEC: u16 = 1_000;
const DELAY_USECS: u16 = USECS_IN_MSEC / MAX_FREQ_KHZ;

const ADDRESS_AUTO_INCREMENT_1_MODE: u8 = 0x40;

const ADDRESS_COMMAND_BITS: u8 = 0xc0;
const ADDRESS_COMMAND_MASK: u8 = 0x0f;

const DISPLAY_CONTROL_BRIGHTNESS_BITS: u8 = 0x88;
const DISPLAY_CONTROL_BRIGHTNESS_MASK: u8 = 0x07;

const DIGITS: [u8; 16] = [
    0x3f, 0x06, 0x5b, 0x4f, //
    0x66, 0x6d, 0x7d, 0x07, //
    0x7f, 0x6f, 0x77, 0x7c, //
    0x39, 0x5e, 0x79, 0x71, //
];

and its config:

[package]
name = "tm1637"
version = "0.1.0"
authors = [
    "Sergey Vakhurin <igelbox@gmail.com>",
    "Jesse Braham <jesse@beta7.io>",
]
description = "A platform agnostic driver to a LED-display powered by the TM1637 chip"
keywords = ["embedded", "embedded-hal-driver"]
categories = ["embedded", "hardware-support", "no-std"]
license = "MIT"
repository = "https://github.com/igelbox/tm1637-rs"
edition = "2018"
include = [
    "**/*.rs",
    "Cargo.toml",
]

[dependencies.embedded-hal]
features = ["unproven"]
version = "0.2.3"

[[example]]
name = "main"

[dev-dependencies]
embedded-hal = { version = "0.2.3" }
stm32f103xx-hal = { git = "https://github.com/japaric/stm32f103xx-hal" }
cortex-m = "0.6.2"
cortex-m-rt = "0.6.12"
panic-halt = "0.2.0"

[profile.release]
codegen-units = 1
debug = true
lto = true
opt-level = 'z'
JonasFocke01 commented 10 months ago

PS: This is really not live threatening, as i have now a good workaround in place! Please close this issue/ dont answer if this seems like something dumb from my side!

workingjubilee commented 10 months ago

@JonasFocke01 Thank you for your assistance in exploring this with us. It would probably be best if you could push an example repo to GitHub with the Cargo.toml and Cargo.lock as well. I can generate one but I won't know for sure if I'm getting the dependency versions correct, and that might matter.

I will confirm that I am noticing the TM1637::print_raw fn that you call does in fact seem to disappear in the release-optimized LLVMIR in my version, while print_raw_iter does in fact remain around because it is called in the implementation of the other fn.

I'm also curious about what the workaround is?

cc @ithinuel @jannic Hello. I am contacting you because you are maintainers of the rp2040-hal library and afaict especially the GPIO code? This is a somewhat esoteric issue that was filed against upstream Rust, but it ultimately concerns whether or not code in rp2040-hal is getting:

  1. misused
  2. miscompiled
  3. failed to uphold some soundness invariant to begin with
  4. some other exotic condition

The signs point to problems 2 through 4, as all the code in the example that breaks, and the seven-segment-display-driver crate in use, is safe code, the library in question propagates errors properly and the problem is optimization dependent (as documented in this issue and https://github.com/igelbox/tm1637-rs/issues/5). It's not clear there is a soundness bug in your library, or if it's some substrate below that (which could be the other crates you depend on orrr just LLVM and rustc), but I felt at this point of evaluating things, I should really reach out and let you know. Also any advice in debugging or minimizing this might be helpful, as root-causing esoteric issues that mostly materialize in embedded code that has real-world ~~ side effects ~~ is often a sticking point in solving Rust bugs.

Attaching the llvmir of the debug vs. release versions for later analysis re: codegen issues. I'm continuing to assume this is a codegen bug until discovered otherwise, but we haven't ruled out sound optimizations of unsound code.

rp_pico_example-debug.ll.txt rp_pico_example-release.ll.txt

apiraino commented 10 months ago

WG-prioritization assigning priority (Zulip discussion).

Please feel to raise the priority if the investigation uncovers a deeper issue in rustc/llvm

@rustbot label -I-prioritize +P-medium

ithinuel commented 10 months ago

Hello,

I don't have a TM1637 to effectively test the binary's behaviour. Because of heavy optimisation & inlining, it is often pretty hard to analyse the disassembly but as far as I can see the compilation's output seems correct to me.

(note that you'd need to either use nightly-2023-12-01 or before or use --features rp2040-hal/disable-intrinsics)

rp2040-hal implementation

The rp2040-hal does not emit any Infallible errors so I don't see you this could end up in the binary regardless of the optimisation level. It is only use to satisfy a trait's associated type.

tm1637 & rp2040-hal

As far as I understand the TM1637 datasheet, the communication bus seems similar to i2c (but not quite the same) and the tm1637 crate bitbangs the bus. If you use a Push-Pull output, you will have two device possibly pulling the DIO wire in opposite directions. Eg during the ack phase, the tm1637 will pull the wire to 0V but the rp2040 pin will pull it to Vcc. The effective level on the wire will depends on each components' io control architecture. This level may be in the indicisive range of the rp2040 input and yield unpredictable result.

So the DIO pin should have a pull-up and use the InOutPin wrapper instead of the plain Pin type that .into_push_pull_output() returns.

Debugging

Also any advice in debugging or minimizing this might be helpful, as root-causing esoteric issues that mostly materialize in embedded code that has real-world side effects is often a sticking point in solving Rust bugs.

Something really helpful when debugging such things is to use an SWD probe and use defmt + defmt-rtt and panic-probe with the print-defmt feature enabled. This allow to add logs and print a message and a backtrace in case of panic.

I made https://github.com/ithinuel/sandbox-rp2040/tree/rust-118330 to check the build. Would you mind giving it a try?

jannic commented 10 months ago

I wonder if the program just gets too fast in release mode because of a missing delay in the tm1637 library. From that lib:

    fn send_bit_and_delay(&mut self, value: Bit) -> Res<E> {
        self.clk.set_low()?;
        if let Bit::ONE = value {
            self.dio.set_high()?;
        } else {
            self.dio.set_low()?;
        }
        self.clk.set_high()?;
        self.delay();

        Ok(())
    }

There is no delay between self.clk.set_low() and self.clk.set_high(). As setting a GPIO pin is infallible on rp2040, all the branches caused by the ? operators get optimized away. So the resulting assembly becomes:

1000026c: 4a16          ldr     r2, [pc, #0x58]         @ 0x100002c8 <tm1637::TM1637<CLK,DIO,D>::send_bit_and_delay::hb6b8088b8d826102+0x60>
1000026e: 2340          movs    r3, #0x40
10000270: 6053          str     r3, [r2, #0x4]
10000272: 2900          cmp     r1, #0x0
10000274: d002          beq     0x1000027c <tm1637::TM1637<CLK,DIO,D>::send_bit_and_delay::hb6b8088b8d826102+0x14> @ imm = #0x4
10000276: 2180          movs    r1, #0x80
10000278: 6011          str     r1, [r2]
1000027a: e001          b       0x10000280 <tm1637::TM1637<CLK,DIO,D>::send_bit_and_delay::hb6b8088b8d826102+0x18> @ imm = #0x2
1000027c: 2180          movs    r1, #0x80
1000027e: 6051          str     r1, [r2, #0x4]
10000280: 6013          str     r3, [r2]

The str rX, [r2, #0x4] lines Instructions at 10000270 and 10000280 are the writes to self.clk. If I count cycles correctly, if r1 is 0, the time between clearing and setting the clock bit is only 6 cycles, or 48ns at 125MHz. That might be too fast for the hardware. And even if the TM1637 can handle such short clock pulses, it will not work reliably through a solderless breadboard or similar cabling.

You could try to reduce the CPU clock appropriately. That should make the clock pulse longer without changing anything else in your code. If that helps, it's a very strong hint that there's no miscompilation involved.

EDIT: Small correction because I misunderstood the assembly.

jannic commented 9 months ago

I found a datasheet for the TM1637 at https://raw.githubusercontent.com/avishorp/TM1637/master/docs/TM1637_V2.4_EN.pdf. According to that, the minimum clock pulse width should be 400ns. (Also, the maximum clock frequency should be 500kHz with 50% duty cycle, which would imply a minimum width of 1000ns.) So a 48ns pulse is definitely too short.

To fix this issue in the tm1637 crate, I opened https://github.com/igelbox/tm1637-rs/pull/7/files

TL;DR: I don't think this is caused by a compiler bug.

workingjubilee commented 9 months ago

@JonasFocke01 Would you test the sample bug program with the TM1637 driver change, please?

You may need to use this:

[patch.crates-io]
tm1637 = { git = "https://github.com/jannic-dev-forks/tm1637-rs", branch = "fix-clock-timing" }
JonasFocke01 commented 9 months ago

@workingjubilee I definetly will! Im quire busy rn, we need to wait for the weekend

Edit: nice title :D

jannic commented 9 months ago

I was able to reproduce it:

This confirms that this is not a compiler bug. It is a bug in the driver crate that only causes issues if the code runs fast enough.

workingjubilee commented 9 months ago

Excellent, thank you very much with the help with that!