Open xphoniex opened 1 year ago
We have some history with AVR miscompilations, so just glancing at this issue my priors are that it is more likely than not that this is a bug somewhere in rustc/LLVM.
You've included a fair bit of information in here, but there's no reproducer that I can run to observe the bug. Can you provide a program (snippet or link to a repo/example) that demonstrates the bug, and precise instructions on how to compile it and observe the bug? It would be a great help if someone tackling this doesn't need to have any particular hardware on hand in order to observe the bug.
To add to saethlin's request for a reproducer:
An entire repo is often fine as a reproducer for AVR issues, even though it is usually not for other targets. Often the problem is either so vague that further minimization is futile (so it can't be minimized) or so glaring that further minimization is pointless (making it apparent at a glance at the Cargo.toml or something like that). In particular, a full repo can be desired because some common ecosystem dependencies have used unsound code recently and you may have included one of them in your dependency graph, so please check in your Cargo.lock when pushing the repo so I can see what exact versions of crates you resolved.
Also, for that repo, please include the exact rustc/cargo build commands you are using to produce this issue, ideally. A file in the repo listing all of them would not be too much detail.
zipped file: avr-bug.zip
$ tree .
.
├── avr-atmega328p.json
├── .cargo
│ └── config.toml
├── Cargo.lock
├── Cargo.toml
├── rust-toolchain.toml
└── src
└── main.rs
avr-atmega328p.json
{
"arch": "avr",
"atomic-cas": false,
"cpu": "atmega328p",
"data-layout": "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8",
"eh-frame-header": false,
"exe-suffix": ".elf",
"executables": true,
"late-link-args": {
"gcc": [
"-lgcc"
]
},
"linker": "avr-gcc",
"linker-is-gnu": true,
"llvm-target": "avr-unknown-unknown",
"max-atomic-width": 8,
"no-default-libraries": false,
"pre-link-args": {
"gcc": [
"-mmcu=atmega328p",
"-Wl,--as-needed,--print-memory-usage,--detailed-mem-usage",
"-fpack-struct",
"-fshort-enums",
"-Wstack-usage=20",
"-Wall","-Wextra","-fstack-usage"
]
},
"target-c-int-width": "16",
"target-pointer-width": "16"
}
.cargo/config.toml
rustflags = ["-Z emit-stack-sizes"]
[target.avr-atmega328p]
runner = "qemu-system-avr -M uno -nographic -serial tcp::5678,server=on -bios"
[build]
target = "avr-atmega328p.json"
[unstable]
build-std = ["core"]
Cargo.toml
[package]
name = "avr-bug"
version = "0.1.0"
edition = "2021"
[dependencies]
panic-halt = "0.2.0"
ufmt = "0.1.0"
noble-secp256k1 = { git = "https://github.com/xphoniex/noble-secp256k1-rs", default-features = false, features = ["8-bit"] }
hmac-sha256 = { version = "1.1.6", default-features = false, features = ["opt_size"] }
[dependencies.arduino-hal]
git = "https://github.com/rahix/avr-hal"
rev = "4c9c44c314eb061ee20556ef10d45dea36e75ee4"
features = ["arduino-uno"]
[dependencies.avr-device]
version = "0.5.0"
[profile.dev]
panic = "abort"
debug = true
lto = true
opt-level = "s"
[profile.release]
panic = "abort"
codegen-units = 1
debug = true
lto = true
#opt-level = "s"
#opt-level = "z"
opt-level = 2
strip = true
[profile.dev.package.compiler_builtins]
overflow-checks = false
rust-toolchain.toml
[toolchain]
channel = "nightly"
components = [ "rust-src" ]
profile = "minimal"
src/main.rs
#![no_std]
#![no_main]
use panic_halt as _;
use arduino_hal::prelude::*;
use core::fmt::Debug;
use ufmt::uWrite;
fn print_hex_arr<S>(tag: &str, serial: &mut S, arr: &[u8])
where
S: uWrite,
<S as uWrite>::Error: Debug,
{
ufmt::uwrite!(serial, "{} = ", tag).unwrap();
for e in arr.iter() {
ufmt::uwrite!(serial, "{:02x}", *e).unwrap();
}
ufmt::uwrite!(serial, "\r\n").unwrap();
}
fn print_hex_arr_rev<S>(tag: &str, serial: &mut S, arr: &[u8])
where
S: uWrite,
<S as uWrite>::Error: Debug,
{
ufmt::uwrite!(serial, "{} = ", tag).unwrap();
for e in arr.iter().rev() {
ufmt::uwrite!(serial, "{:02x}", *e).unwrap();
}
ufmt::uwrite!(serial, "\r\n").unwrap();
}
#[arduino_hal::entry]
fn main() -> ! {
let dp = arduino_hal::Peripherals::take().unwrap();
let pins = arduino_hal::pins!(dp);
let mut serial = arduino_hal::default_serial!(dp, pins, 57600);
/*
// Uncommenting this part changes the output of hmacs
use noble_secp256k1::awint::{cc, inlawi, inlawi_ty, Bits, InlAwi};
use noble_secp256k1::{BigNum, Curve};
let mut private_key = inlawi!(0x02_u512);
let curve = Curve::secp256k1();
let public_key = curve.multiply_simple(&mut private_key);
let mut buf = [0; 32];
public_key.x.to_u8_slice(&mut buf);
print_hex_arr_rev("x", &mut serial, &buf);
public_key.y.to_u8_slice(&mut buf);
print_hex_arr_rev("y", &mut serial, &buf);
*/
let h = hmac_sha256::HMAC::mac(b"hello", b"key");
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello");
print_hex_arr("hash", &mut serial, &h);
loop {}
}
$ rustc -vV
rustc 1.69.0-nightly (ef934d9b6 2023-02-08)
binary: rustc
commit-hash: ef934d9b632b8ac00276558824664c104b92b5f0
commit-date: 2023-02-08
host: x86_64-unknown-linux-gnu
release: 1.69.0-nightly
LLVM version: 15.0.7
In order to run this you need both avr toolchain such as avr-gcc
and qemu-system-avr
. In a terminal run cargo run -r
and in another:
$ telnet localhost 5678
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
then try changing opt-level
, lto
, and uncomment the other crate in the main.rs
, and the move code around. you should be able to re-produce the bugs.
For future reference, creating and pushing a repo to GitHub or another host that I can clone with a single command would have been preferred over something that requires me to manually reconstruct the files inside a repository.
For future reference, creating and pushing a repo to GitHub or another host that I can clone with a single command would have been preferred over something that requires me to manually reconstruct the files inside a repository.
You're right, added the zip file.
Hmm. I don't know, in that case! Fair enough?!
Thank you for the clear instructions, I will try to reproduce this soon.
Also you will want to update to a version of rustc of about February 19 or later, as an ABI fix for AVR targets was merged about then. It doesn't affect your visible source but it may affect your dependencies.
polite bump. any updates on this?
The reproducer doesn't reproduce the bug on the latest nightly. At least it doesn't when I try it, I always get the correct output regardless of opt level.
If I turn off LTO though, I get
$ cargo +nightly build --release -j1
Compiling compiler_builtins v0.1.92
Compiling core v0.0.0 (/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
LLVM ERROR: Expected a constant shift amount!
error: could not compile `core` (lib)
So perhaps we still have a bug but I think the one you meant to report here is fixed? Can you confirm?
Thanks for the reply. Only half is fixed:
After updating to nightly 1.72, this consistently produces the correct output regardless of compiler settings:
let h = hmac_sha256::HMAC::mac(b"hello", b"key"); // hmac for input "hello" with key "key"
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello"); // sha256 of "hello"
print_hex_arr("hash", &mut serial, &h);
(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)
However if you go back to main.rs, you'll see a code block which is commented out:
/*
// Uncommenting this part changes the output of hmacs
use noble_secp256k1::awint::{cc, inlawi, inlawi_ty, Bits, InlAwi};
use noble_secp256k1::{BigNum, Curve};
let mut private_key = inlawi!(0x02_u512);
let curve = Curve::secp256k1();
let public_key = curve.multiply_simple(&mut private_key);
let mut buf = [0; 32];
public_key.x.to_u8_slice(&mut buf);
print_hex_arr_rev("x", &mut serial, &buf);
public_key.y.to_u8_slice(&mut buf);
print_hex_arr_rev("y", &mut serial, &buf);
*/
let h = hmac_sha256::HMAC::mac(b"hello", b"key");
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello");
print_hex_arr("hash", &mut serial, &h);
Uncomment it and let's run the code, cargo run -r
in one terminal and telnet localhost 5678
in another. Remember, we expect:
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
But we get:
opt-level = 2
:
x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
**O = dcfd06ba19e34eb9402ff79e0ca8507a8cc0f5a39a8c6b51b5e533570b2799de // ??
�= df04b29ead3cfe390f33ebe4749c7f518d6c1ebc450eb6e2f41d5c9e61d04376 // ??
opt-level = "s"
:
x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
mac = 350cc5a62c6000f16a0ea550bb2aa4805fcb76530167a9d2f7ea75c5cea7b366 // ??
hash = 8cfa54f76e17be6575e0894ebbac4460cc02b6c8f818c96593dfa77ce158a6a4 // ??
opt-level = "z"
:
x = 0c8333020c4688a754bf3ad462f1e9f1fac80649a463ae4e4c1b04ead21e5da0 // ??
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe763b8371104fdb818 // ??
mac = cd3a72deda62ab9e5ce61720ced04b523d6227b9d36419d400f2088318413030 // ??
hash = 19217799055b90ffb23be75faefbd3df8bfeb0fc19778d82c2d3510c32978017 // ??
So, my initial guess what that some weird stack overflow is happening (from the code block we just uncommented) and decided to move the hmac part on top of the code so like:
let h = hmac_sha256::HMAC::mac(b"hello", b"key");
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello");
print_hex_arr("hash", &mut serial, &h);
// Uncommenting this part changes the output of hmacs
...
Here are the outputs I'm getting now:
opt-level = 2
:
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
// GETS STUCK HERE
opt-level = "s"
:
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
x = 2351285e80f3f9e23a1fbceecedc1804c922223a2292951d54872339f728099e // ??
y = 29c0e92da06bc8723a6045500f56c85a5a484f4c4f3e33405a3015be140a3a6f // ??
opt-level = "z"
:
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
x = 79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798 // ??
y = 016f2077de62f7d85e99bad51fc94eb5f70363ae9ba724d3ff7b53573a192b48 // ??
as you can see, plenty of different variations, depending on just the compiler setting. this might have something to do with overflow, but shouldn't compiler catch it?
$ avr-size target/avr-atmega328p/release/avr-bug.elf
text data bss dec hex filename
16752 486 1 17239 4357 target/avr-atmega328p/release/avr-bug.elf
WG-prioritization assigning priority (Zulip discussion).
@xphoniex a note about support of compile target avr-unknown-gnu-atmega328
that I don't see mentioned in this issue. This is a tier 3 compile target (policy) so this in practice means that support is volunteer-based and contributions are welcome.
@rustbot label -I-prioritize +P-low
(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)
You probably want to cargo +stage1
.
this might have something to do with overflow, but shouldn't compiler catch it?
This could be another case of compiler_builtins providing its own intrinsics in place of libgcc ones - could you try adding this to .cargo/config
:
build-std-features = ["compiler-builtins-mangled-names"]
... and trying again?
this might have something to do with overflow, but shouldn't compiler catch it?
This could be another case of compiler_builtins providing its own intrinsics in place of libgcc ones - could you try adding this to
.cargo/config
:build-std-features = ["compiler-builtins-mangled-names"]
... and trying again?
.cargo/config.toml
?:
rustflags = ["-Z emit-stack-sizes"]
build-std-features = ["compiler-builtins-mangled-names"]
also tried
[unstable]
build-std = ["core"]
build-std-features = ["compiler-builtins-mangled-names"]
same output as before.
(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)
You probably want to
cargo +stage1
.
This was such a bad idea, started with
$ git checkout master
$ git bisect start
$ git bisect good
$ git bisect bad 1.66.0
but it won't compile with 1.66
because:
unknown print request `split-debuginfo`
which I suppose is asked for by cargo
?
decided to start from 1.67
and:
error: failed to load manifest for workspace member `/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/sysroot`
Caused by:
failed to read `/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/sysroot/Cargo.toml`
Caused by:
No such file or directory (os error 2)
tried stage2
, no luck.
and git bisect
is now stuck:
$ git bisect bad
The merge base f058493307813a5298851f79ad6187f4ad2c7e15 is bad.
This means the bug has been fixed between f058493307813a5298851f79ad6187f4ad2c7e15 and [33c3d101280c8eb3cd8af421bfb56a8afcc3881d].
$ git bisect bad
The merge base f058493307813a5298851f79ad6187f4ad2c7e15 is bad.
This means the bug has been fixed between f058493307813a5298851f79ad6187f4ad2c7e15 and [33c3d101280c8eb3cd8af421bfb56a8afcc3881d].
...
also tried 1.68 separately, and that doesn't work either, same error as 1.67.
Okie, I think I got it - your code prints invalid results¹ because you've got a stack overflow 😇
Since AVR doesn't provide memory hardware watches (and rustc/llvm don't insert stack canaries), stack overflow in here doesn't cause the program to halt but rather simply trashes some other stuff present in RAM - in this case, this stuff happens to be apparently some values used to calculate the hashes.
We can prove it with simavr - first, let's start the simulator:
simavr -m atmega328p ./target/avr-atmega328p/release/avr-bug.elf -g
... then, in another tab, let's run avr-gcc:
$ avr-gdb
(gdb) file ./target/target/avr-atmega328p/release/avr-bug.elf
(gdb) layout asm
(gdb) layout regs
(gdb) target remote :1234
Now, before we're off to detecting stack trashing, we must talk about memory layout - on this AVR, since there's only two kilobytes of RAM, it's pretty straightforward:
0 .. X - "constant" stuff (strings, static arrays, static variables etc.)
X .. 2 KB - stack
When an AVR program starts, it copies "static stuff" from flash memory into RAM² - that stuff ends up at the beginning of the memory; after that, we've got the stack which starts at the end of RAM and grows down (i.e. towards the lower addresses).
That is to say, if a program uses 32 bytes of static data, it leaves 2048 - 32 = 2016 bytes for stack frames; any stack frame larger than that will, sooner or later, unavoidably overwrite some random stuff present in RAM.
Okie, okie, let's get back to business - let's setup a memory watch to find out what messes up the RAM!
After running a single instruction:
(gdb) si
... the list of registers at the top gets filled with values - in particular, we're interested in the SP (stack pointer) register:
SP 0x8ff 0x8008ff
... and, in-particular in-particular, we're interested in the second value there - 0x8008ff
; since we haven't run any function yet, this register determines the end of our RAM (and, conversely, the beginning of our stack).
(this register has two values because 0x8ff
is how the AVR sees it, while the second value is how gdb sees it - tldr)
Before we're able to setup the watch, we have to find one more value: size of our "constant stuff" section; to do that we can simply move forward a few instructions, up to an intrinsic called __do_copy_data
:
Remote debugging using :1234
0x00000000 in __vectors ()
(gdb) si
0x00000068 in __trampolines_start ()
(gdb) si
0x0000006a in __trampolines_start ()
(gdb) si
0x0000006c in __trampolines_start ()
(gdb) si
0x0000006e in __trampolines_start ()
(gdb) si
0x00000070 in __trampolines_start ()
(gdb) si
0x00000072 in __trampolines_start ()
(gdb) si
__do_copy_data () at ../../../../gcc-12.2.0/libgcc/config/avr/lib1funcs.S:2405
(gdb)
__do_copy_data
is an internal function (linked automatically by avr-gcc and run before Rust's main()
) that copies stuff from flash into RAM - in my case I've gotten:
>0x74 <__do_copy_data> ldi r17, 0x02 ; 2 │
│ 0x76 <__do_copy_data+2> ldi r26, 0x00 ; 0 │
│ 0x78 <__do_copy_data+4> ldi r27, 0x01 ; 1 │
│ 0x7a <__do_copy_data+6> ldi r30, 0x94 ; 148 │
│ 0x7c <__do_copy_data+8> ldi r31, 0x3D ; 61 │
│ 0x7e <__do_copy_data+10> rjmp .+4 ; 0x84 <__do_copy_data+16> │
│ 0x80 <__do_copy_data+12> lpm r0, Z+ │
│ 0x82 <__do_copy_data+14> st X+, r0 │
│ 0x84 <__do_copy_data+16> cpi r26, 0xD4 ; 212 │
│ 0x86 <__do_copy_data+18> cpc r27, r17 │
│ 0x88 <__do_copy_data+20> brne .-10 ; 0x80 <__do_copy_data+12>
... where cpi + cpc
near the bottom determine the size of our "constant stuff" - and while it's not immediately obvious, if you step through the logic, it comes down to:
2 * 255 + 212 = 722
r17
This means that our stack effectively ends at:
0x8008ff - 2048 + 722 = 0x8003d1
... so let's create a watch precisely for that:
(gdb) watch *0x8003d1
... let the program continue:
(gdb) continue
... and voilà, our RAM gets corrupted:
Continuing.
Hardware watchpoint 2: *0x8003d1
Old value = 0
New value = 56
0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()
... anything after this instruction is effectively undefined behavior (from Rust's point of view, at least).
We can even print stacktrace, for fun:
(gdb) bt
#0 0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()
#1 0x0000395e in noble_secp256k1::point::Point::double_assign_mod ()
#2 0x0000105a in avr_bug::__avr_device_rt_main ()
#3 0x0000101c in main ()
Soooo.... tl;dr he data too big for he god damn RAM.
Not all hope is lost, though! -- if you really need to compute those hashes on AVR, I think you could relatively easily use the avr-progmem crate to store those big bois:
... into flash so that they are not copied into RAM (effectively trading-off a bit of performance here).
That's just a hunch, but doing that would be my first step - and you could even upstream it later by hiding this avr-progmem behind some feature-flag or #[cfg(arch = "avr")]
.
(due diligence: I've been testing the program with https://reviews.llvm.org/D152059 and https://reviews.llvm.org/D152063 patches applied; it's possible that the current nightly compiler miscompiles something, but everything seems to be alright with those two patches.)
¹ or sometimes it hangs because (depending on the compiler flags and Moon's alignment) the code panics which causes it to enter the panic_halt
's panic handler which is basically loop {}
; you can detect that by pressing Ctrl+C in gdb and looking at the stack trace
² that's kinda-sorta required because most AVR instructions don't support reading stuff directly from the flash memory - and so if the program did not do that, each read into the flash memory would require an extra register and an extra instruction which could become wasteful in the long term
Thanks @Patryk27 for the entire time you dedicated to present this solution.
@xphoniex problems of stack overflow
in AVR unfurtunately are nasty because it produces undefined behaviour but no notification about it is generated, until we get it in the run-time.
Maybe eventually in future, someone can produce a script to use SIMAVR and GDB to test this case condition automagically.
Are we able to close this issue?
Thanks @Patryk27 for the write up, needed some time to verify this. I can't re-produce this entirely on my side:
I don't have simavr
so I tried to see if qemu
works:
$ qemu-system-avr -S -s -M uno -nographic -bios target/avr-atmega328p/release/avr-bug.elf
in another terminal:
$ avr-gdb -ex 'target remote :1234' target/avr-atmega328p/release/avr-bug.elf
got me connected. however, I can't see $SP
being set after first si
, unlike in your case:
Edit: can see $SP
after one si
in simavr
.
0x00000000 in __vectors ()
(gdb) si
0x00000068 in __trampolines_start ()
(gdb) i r
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x0 0
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x0 0
r14 0x0 0
r15 0x0 0
r16 0x0 0
r17 0x0 0
r18 0x0 0
r19 0x0 0
r20 0x0 0
r21 0x0 0
r22 0x0 0
r23 0x0 0
r24 0x0 0
r25 0x0 0
r26 0x0 0
r27 0x0 0
r28 0x0 0
r29 0x0 0
r30 0x0 0
r31 0x0 0
SREG 0x80 128
SP 0x0 0x0 <__vectors>
PC2 0x68 104
pc 0x34 0x68 <__trampolines_start>
let's continue anyway:
(gdb) display $SP
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006a in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006c in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006e in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x00000070 in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x00000072 in __trampolines_start ()
1: $SP = (void *) 0x800800
(gdb) si
0x00000074 in __do_copy_data ()
1: $SP = (void *) 0x8008ff
so my $SP
is finally set to 0x8008ff
right on __do_copy_data
.
there seems to be an issue with the way you've calculated the height of static data:
(gdb) x/12i $pc
=> 0x74 <__do_copy_data>: ldi r17, 0x02 ; 2
0x76 <__do_copy_data+2>: ldi r26, 0x00 ; 0
0x78 <__do_copy_data+4>: ldi r27, 0x01 ; 1
0x7a <__do_copy_data+6>: ldi r30, 0x02 ; 2
0x7c <__do_copy_data+8>: ldi r31, 0x44 ; 68
0x7e <__do_copy_data+10>: rjmp .+4 ; 0x84 <__do_copy_data+16>
0x80 <__do_copy_data+12>: lpm r0, Z+
0x82 <__do_copy_data+14>: st X+, r0
0x84 <__do_copy_data+16>: cpi r26, 0xE4 ; 228
0x86 <__do_copy_data+18>: cpc r27, r17
0x88 <__do_copy_data+20>: brne .-10 ; 0x80 <__do_copy_data+12>
0x8a <__do_clear_bss>: ldi r18, 0x02 ; 2
IIUC, this loop keeps increasing X
(which is actually r26:r27
) at line 0x82
until it becomes $r17
:0xE4
which is 0x02e4
.
So the height is 0x0100 - 0x02e4 = 0x01e4 // 484
, you can verify this using a break:
(gdb) break *0x8a
Breakpoint 1 at 0x8a
(gdb) continue
Continuing.
Breakpoint 1, 0x0000008a in __do_clear_bss ()
(gdb) i r $r26
r26 0xe4 228
(gdb) i r $r27
r27 0x2 2
So, the end of our stack is either
0x8008ff - 2048 + 484 = 0x8002e3
// or
0x8008ff - 2048 + 484 + 255 = 0x8003e3
but when I create a watch for that:
(gdb) watch *0x8002e3
Hardware watchpoint 2: *0x8002e3
(gdb) watch *0x8003e3
Hardware watchpoint 3: *0x8003e3
(gdb) continue
Continuing.
program outputs:
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
x = 29f76222af71953c82f73506a881cc859bc20e5eb0edf6755466ce740c5b18a3
y = 01ce9b739b46c5a1353f47830a39c591552f4c874a29307f551f13010f013562
but I'm not getting this:
Continuing.
Hardware watchpoint 2: *0x8003d1
Old value = 0
New value = 56
0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()
manually examining using x/x
shows the the value to have changed:
0x00000000 in __vectors ()
(gdb) x/x 0x8002e3
0x8002e3 <anon.4077e3db088e300c71796ada4487f52c.8+42>: 0x00000000
(gdb) watch *0x8002e3
Hardware watchpoint 1: *0x8002e3
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
avr_bug::__avr_device_rt_main () at src/main.rs:62
62 loop {}
(gdb) x/x 0x8002e3
0x8002e3 <anon.4077e3db088e300c71796ada4487f52c.8+42>: 0x5ffe6f4d
--
Edit: watches are hit when using simavr
, but unexpected fn:
Old value = 101
New value = 357
avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228 unsafe { restore(irq_flag); }
Oh, neat - thanks for answering, I'll re-check my logic today / tomorrow 🙂
Oh, neat - thanks for answering, I'll re-check my logic today / tomorrow 🙂
Thank you for the help. Updated my post.
simavr
seems to be working slightly differently than qemu-system-avr
, latter keeps looping and pushes a single core utilization to 100%, and never seems to be yielding once it enters the loop.
Former is lighter on CPU, and releases, hence why gdb
can show the Old value
/ New value
, I guess?
So this is no longer rust's fault, but still three questions remaining:
qemu-system-avr
, if anyone knows.2048 - (0xff + 0xe4)
or 2048 - (0xff + 0xff + 0xe4)
avr-size
seems to be useless.
$ avr-size target/avr-atmega328p/release/avr-bug.elf
text data bss dec hex filename
17410 484 1 17895 45e7 target/avr-atmega328p/release/avr-bug.elf
Edit: just realized the 484 shown here matches what I calculated in the above post, 0x0100 - 0x02e4 = 0x01e4
. I was expecting it to show me the max stack usage, but it's actually static stack.
I can manually set watch points and run the program, but that has to be scripted and run multiple times to pinpoint it. And backtrace doesn't seem to have this info. Any ideas re this? This is the most important thing, as I can estimate how much stack I have left, etc.
Another idea would be to re-compile qemu
or simavr
with variable stack size.
There's japaric/cargo-call-stack, however it doesn't support avr.
Did some further digging, are you sure about this:
When an AVR program starts, it copies "static stuff" from flash memory into RAM² - that stuff ends up at the beginning of the memory; after that, we've got the stack which starts at the end of RAM and grows down (i.e. towards the lower addresses).
If I understood correctly, static part of memory gets filled on start up (in __do_copy_data
) from 0x800100
and grows up. Once all the static stuff is copied, the program starts and stack grows downward from 0x8008ff
.
For example, if I slightly change the code by moving this part inside its own block:
{
let h = hmac_sha256::HMAC::mac(b"hello", b"key");
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello");
print_hex_arr("hash", &mut serial, &h);
}
and break on the first print_hex_arr
, I can get:
# program output
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b..
# inside gdb
(gdb) x/24x 0x8008ff - 96
0x80089f: 0x00000000 0x00000000 0xb9b30793 0x17b5ef15
0x8008af: 0x8c4df11f 0xc7bc5fb5 0xefc0c698 0x6dd65614
0x8008bf: 0xa76a1aed 0x7b8ba523 0x00000000 0x00000000
0x8008cf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008df: 0x00000000 0x00000000 0x08000000 0x000002ff
0x8008ef: 0x00000000 0x00000000 0x00000000 0x00780200
which is showing h
on stack, at the end of memory, but growing upwards, we were expecting downwards. (this also doesn't work if code is not inside its own block, and if break is on the second print)
however, when I place a watch right after the end of the static area:
0x00000000 in __vectors ()
(gdb) watch *(0x800100 + 484)
Hardware watchpoint 1: *(0x800100 + 484)
(gdb) cont
Continuing.
Hardware watchpoint 1: *(0x800100 + 484)
Old value = 0
New value = 1
avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228 unsafe { restore(irq_flag); }
(gdb) c
Continuing.
Hardware watchpoint 1: *(0x800100 + 484)
Old value = 1
New value = 1281
0x000029dc in hmac_sha256::State::blocks (self=0x800509, input=...) at src/lib.rs:196
196 while inlen >= 64 {
(gdb) x/180x (0x800100 + 1448)
0x8006a8: 0x5c5c5c5c 0x5c5c5c5c 0x5c5c5c5c 0x5c5c5c5c
0x8006b8: 0x5c5c5c5c 0x5c5c5c5c 0x5c5c5c5c 0x5c5c5c5c
0x8006c8: 0x5c5c5c5c 0x5c5c5c5c 0x5c5c5c5c 0x3e5c5c5c
0x8006d8: 0xec3dec8b 0x4d7eabbb 0xc97fe6dd 0xbe1727b9
0x8006e8: 0x5b16cd5c 0xdcce6cd0 0x62590d78 0x00a9c445
0x8006f8: 0x00004000 0x00000000 0x00000000 0x00000000
0x800708: 0x00000000 0x00000000 0x00000000 0x00000000
0x800718: 0x00000000 0x00000000 0x00000000 0x00000000
0x800728: 0x00000000 0x00000000 0x00000000 0x00000000
0x800738: 0x00000000 0x00000000 0x00000000 0x00000000
0x800748: 0x00000000 0x00000000 0x00000000 0x00000000
0x800758: 0x00000000 0x00000000 0x00000000 0x00000000
0x800768: 0x00000000 0x00000000 0x00000000 0x00000000
0x800778: 0x00000000 0x00000000 0x00000000 0x00000000
0x800788: 0x00000000 0x00000000 0x00000000 0x00000000
0x800798: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007b8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007d8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007e8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8007f8: 0x00000000 0x00000000 0x00000000 0x00000000
0x800808: 0x00000000 0x00000000 0x00000000 0x00000000
0x800818: 0x00000000 0x00000000 0x00000000 0x00000000
0x800828: 0x00000000 0x00000000 0x00000000 0x00000000
0x800838: 0x00000000 0x00000000 0x00000000 0x00000000
0x800848: 0x00000000 0x00000000 0x00000000 0x00000000
0x800858: 0x00000000 0x00000000 0x00000000 0x00000000
0x800868: 0x00000000 0x00000000 0x00000000 0x00000000
0x800878: 0x00000000 0x00000000 0x00000000 0x00000000
0x800888: 0x00000000 0x00000000 0x00000000 0x00000000
0x800898: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008b8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008d8: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008e8: 0xff080000 0x00000002 0x00000000 0x00000000
0x8008f8: 0x00000000 0x4f007802 Cannot access memory at address 0x800900
this has grown the static area upwards, whereas I was expecting the static area to remain the same size, but the program stack to grow downwards and overwrite it.
how is it possible to still create static stuff after __do_copy_data
? I thought static part is taken care of by compiler, and copied on start, but not in runtime?
avr-size
also shows static size as 484:
$ avr-size target/avr-atmega328p/release/avr-bug.elf
text data bss dec hex filename
17374 484 1 17859 45c3 target/avr-atmega328p/release/avr-bug.elf
so either:
avr-size
not counting them then?)I created an issue reproducer
repository at https://github.com/fvilante/avr_bug
According to previous requested by @workingjubilee here, I created a git repo with the reproducer based on @xphoniex (issue author) comment.
The motivation is because I was not able to compile the code as originally given.
@xphoniex I made some small adjustments to the reproducer you have sent which are visible in the repository log history. Please clone this repository and check if you can validate it as an official reproducer for the issue.
It's very important that we have a deterministic reproducer that works the same way independent of time or machine it is run.
IMPORTANT If you consider the case, update the original issue with a link to this repository.
TLDR: It has been proved that the problem is not in the rust compiler side.
I followed an different approach than @Patryk27 and decided to experiment to take the core effect of the reproducer code
(which is to be able to generate two hash values, in different optimizations levels, on an AVR platform) and run it in real hardware (avr328p based) using a different rust project that I've been configured by my own.
I assumed that the core effect desired by the reproducer boils down to:
let h1 = hmac_sha256::HMAC::mac(b"hello", b"key");
let h2 = hmac_sha256::Hash::hash(b"hello");
Then I inserted this two lines into another project that I've being using since a year, that targets the same hardware of the reproducer. It runs on a board with an atmega328p and an lcd display. I defined that the values of variables h1
and h2
should be printed on the display, with an interval of 1 sec between each other, so I can check visually its actual values.
TLDR: The result was as expected; The same (and correct) output value no matter what kind of optimization level I use.
This video shows the result in real-time, which is as expected as below and independent of the optimization level tested.
mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
I tested for following optimizations levels:
opt-level = "s", lto = true
opt-level = "z", lto = true
NOTE: My project enviroment does not allow to compile
lto = false
oropt-level
other than "s" or "z". This is intentional and not related with this @xphoniex issue.. There is no point in using this compiler parameters in AVR targets given its flash size constraints.
Although this present test does not answer the question what's wrong with the reproducer code?
it shows that it is possible, with rust language, to obtain the desired reproducer effect (to produce the hashes) independent of the compiler's optimization level set.
In my code I did not use ufmt dependency. And I do not send the output to the serial, instead I send it direct to the lcd display. But I have settled up in my project a working serial port driver (which is tested in field). I did not use it because I had not a cable in the test moment.
But we can conclude that if the issue is about a possible mis-behaviour of the rust compiler, then the test probes that it is not the case.
The rust compiler is able to compile a program for AVR target that correct calculates hash functions values and displays its values into a lcd display (and hopefully into a serial port).
If there is no problem with the Rust compiler then the reproducer problem should be with some dependency it is using (for example ufmt or serial) and/or with the compiler options it's using (check this compiler correction I've made on reproducer code).
It's also important to note that I've found that stackoverflow
error (same as primarily suggested by @Patryk27 in his comment) seems to be a sneaky kind of error sometime present in AVR development. And as far as I know, there is no known method to prevent it at compile time.
But if this is the case, then the best forum to discuss how to detect a SO errors in AVR?
should not be in the rust language issue list but in the particular community specific for this target (for example: gitter avr lobby), once AVR is a tier 3 policy target which means that it's supported mainly by its community.
I consider that in the community we have a lack of documentation on how to detect stack overflow errors. Even I do not know a consistent method to safely detect this kind of error yet. I was reading @Patryk27 and your (@xphoniex) comment's exchange about that subject, and found that you also had problem to detect the exact point where stack reach its limits.
Eventually you may help the community to write some blog post about it.
I think this present hardware test make this issue eligible to be closed, once it has been proved the rust compiler capability to produce the desired effect. Showing therefore that the issue has other cause.
But @xphoniex please take your time to check by your own the data presented here and let us know your considerations.
Thank you Flavio, for putting in the effort.
The motivation is because I was not able to compile the code as originally given.
Can you give me the compiler error you get with the zip file? Remember you have to cargo build | run -r
.
It's very important that we have a deterministic reproducer that works the same way independent of time or machine it is run.
So, I went back and tried different toolchains, and to my surprise, I was getting the correct hashes. After downloading the zip file again, and comparing all files one by one, I realized this line had fixed the issue:
build-std-features = ["compiler-builtins-mangled-names"]
which had been suggested by Patryk here. Funnily enough, when I tried this remedy before, here, it didn't solve the issue for me back then. Right now I can't re-produce it.
Anyway, you should be able to reproduce with these settings:
# Cargo.toml
opt-level = "s | z"
# rust-toolchain.toml
channel = "nightly-2022-07-10 | nightly-2022-08-11 | nightly-2022-07-10"
mac = d238e536e20f0b1b210644248134891454c20ad29c10f75756218b6ab8f5c17d
hash = 5f23619d4ed28dbf06d25969fa262384869011f4a44469c1eb38c50b15b01c4b
mac = 03942d...
specified in the first post.compiler-builtins-mangled-names
didn't help before, but setting it now fixes all the issues, except for opt-level of 1, which might be due to SOSo, the original issue with mac/hmac is resolved somewhere in 1.70 - 1.72 but to get it working with prior versions you can use compiler-builtins-mangled-names
feature.
This is in fact a compiler issue, and if someone can point down the commit which has resolved it, I'd be grateful, as I wasn't able to using git bisect
.
Most of what Patryk wrote is correct expect the way he cacluated the copy size of the static data in __do_copy_data
. That routine copies the same bytes as mentioned in data section output of avr-size
.
Problem now is, I stepped through the code line by line inside gdb, and I can see the stack is growing from both sides:
I don't know why the 2nd is happening, and I've tried talking to a few people with knowledge of AVR but no one's engaged me so far.
avr-size
not detecting the area which should be reserved for hmac? avr-size
bug?I'm not 100% sure this is not a Rust/LLVM issue given the aforementioned points.
Let me know what you think.
- If this is how const/static area grows, why is avr-size not detecting the area which should be reserved for hmac?
- Possibly avr-size bug?
- Is this because of crates and somehow a rust issue, where hmac - being an external crate - won't end up showing?
First, before anything: Rust will naturally prefer to statically link all crates and do considerable inlining, or recommend such to LLVM, which can make it hard to track things down at the best of times. And the crate you named uses inline(always)
on many functions.
However, it is also very common for tools like avr-size
(or indeed gdb!) to expect certain behaviors from tools that emit or manipulate binaries that may be, for instance, overfitted on the specific behaviors of GCC. It is often hard for them to do otherwise, sometimes, as they may be trying to extract data that is slightly more subtle than mandated by the object format's annotations. While avr-gcc may be the linker here, obviously Rust and LLVM have had more than a little input. Even worse, there are multiple different ways of evaluating the "size" of a binary's sections that can give different results. And it's not that one is right and one is wrong, necessarily.
So, armed with that knowledge, finding ourselves at the bottom of a very deep and dark rabbit hole, that might be filled with monsters, actually, I must ask: Won't end up showing... where?
First, before anything: Rust will naturally prefer to statically link all crates and do considerable inlining, or recommend such to LLVM, which can make it hard to track things down at the best of times. And the crate you named uses
inline(always)
on many functions.
My understanding was that inline
isn't respected across crate, incorrect?
So, armed with that knowledge, finding ourselves at the bottom of a very deep and dark rabbit hole, that might be filled with monsters, actually, I must ask: Won't end up showing... where?
The data
required for hmac crate, is not included in data
section of the elf file. We now how two steps, one to copy (static) data for the lib, and then (static) data for hmac crate. See next post too please.
preface: tried posting this to simavr google group but couldn't, and the gitter room suggested is not very active, it's our own Flavio, and Patryk.
How does stack memory grow on AVR?
I've coded a small program for atmega328p, in rust/llvm, and it's not giving me the expected output. I don't know if the issue is on the compiler side or memory, so I'm trying to rule out the latter first (stack overflow).
__do_copy_data
, static stuff are copied to 0x800100, growing upwards__do_copy_data
, stack grows downward from 0x8008ffIs everything I said above correct? cause in practice, something else is happening. Initially, __do_copy_data
copies the same bytes as shown in avr-size
data section to the beginning of SRAM. Then after s
ing enough steps, stack data show up, at the end of SRAM, but eventually at some point, some stuff gets copied to the the area right after static area as well. Here's proof:
In one terminal:
$ simavr -m atmega328p avr-bug.elf -g
In another:
$ avr-gdb -ex 'target remote :1234' avr-bug.elf
0x00000000 in __vectors ()
(gdb) define nxt
Type commands for definition of "nxt".
End with a line saying just "end".
>s
>x/60x (0x800100 + 182)
>x/60x (0x8008ff - 240)
>end
then keep nxt
'ing, first stack stuff should show up after 6 nxt:
avr_device::interrupt::disable_save () at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:107
107 asm!(
0x8001b6 <DEVICE_PERIPHERALS>: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001c6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001d6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001e6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001f6: 0x00000000 0x00000000 0x00000000 0x00000000
0x800206: 0x00000000 0x00000000 0x00000000 0x00000000
0x800216: 0x00000000 0x00000000 0x00000000 0x00000000
0x800226: 0x00000000 0x00000000 0x00000000 0x00000000
0x800236: 0x00000000 0x00000000 0x00000000 0x00000000
0x800246: 0x00000000 0x00000000 0x00000000 0x00000000
0x800256: 0x00000000 0x00000000 0x00000000 0x00000000
0x800266: 0x00000000 0x00000000 0x00000000 0x00000000
0x800276: 0x00000000 0x00000000 0x00000000 0x00000000
0x800286: 0x00000000 0x00000000 0x00000000 0x00000000
0x800296: 0x00000000 0x00000000 0x00000000 0x00000000
0x80080f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80081f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80082f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80083f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80084f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80085f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80086f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80087f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80088f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80089f: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008af: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008bf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008cf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008df: 0x00000000 0x00000000 0x08000000 0x000001ff
0x8008ef: 0x00000000 0x00000000 0x00000000 0x004a0100
and after another 7 nxt
, you'll see something at the beginning of free SRAM area
avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228 unsafe { restore(irq_flag); }
0x8001b6 <DEVICE_PERIPHERALS>: 0x00000001 0x00000000 0x00000000 0x00000000
0x8001c6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001d6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001e6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001f6: 0x00000000 0x00000000 0x00000000 0x00000000
0x800206: 0x00000000 0x00000000 0x00000000 0x00000000
0x800216: 0x00000000 0x00000000 0x00000000 0x00000000
0x800226: 0x00000000 0x00000000 0x00000000 0x00000000
0x800236: 0x00000000 0x00000000 0x00000000 0x00000000
0x800246: 0x00000000 0x00000000 0x00000000 0x00000000
0x800256: 0x00000000 0x00000000 0x00000000 0x00000000
0x800266: 0x00000000 0x00000000 0x00000000 0x00000000
0x800276: 0x00000000 0x00000000 0x00000000 0x00000000
0x800286: 0x00000000 0x00000000 0x00000000 0x00000000
0x800296: 0x00000000 0x00000000 0x00000000 0x00000000
0x80080f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80081f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80082f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80083f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80084f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80085f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80086f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80087f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80088f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80089f: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008af: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008bf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008cf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008df: 0x00000000 0x00000000 0x08000000 0x000001ff
0x8008ef: 0x00000000 0x00000000 0x00000000 0x004a0100
I don't know what that irq_flag
is, but we'll ignore it for now, this area remains zeros until we nxt
448 more times:
awint_core::data::inlawi::InlAwi<512, 33>::unstable_from_u8_slice<512, 33> (buf=...)
at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/awint_core-0.7.0/src/data/inlawi.rs:164
164 let mut raw = [0; LEN];
0x8001b6 <DEVICE_PERIPHERALS>: 0x056608c4 0x0666084e 0x00d005d8 0x00661000
0x8001c6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001d6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001e6: 0x00000000 0x00000000 0x00000000 0x00000000
0x8001f6: 0x00000000 0x00000000 0x00000000 0x00000000
0x800206: 0x00000000 0x00000000 0x00000000 0x00000000
0x800216: 0x00000000 0x00000000 0x00000000 0x00000000
0x800226: 0x00000000 0x00000000 0x00000000 0x00000000
0x800236: 0x00000000 0x00000000 0x00000000 0x00000000
0x800246: 0x00000000 0x00000000 0x00000000 0x00000000
0x800256: 0x00000000 0x00000000 0x00000000 0x00000000
0x800266: 0x00000000 0x00000000 0x00000000 0x00000000
0x800276: 0x00000000 0x00000000 0x00000000 0x00000000
0x800286: 0x00000000 0x00000000 0x00000000 0x00000000
0x800296: 0x00000000 0x00000000 0x00000000 0x00000000
0x80080f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80081f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80082f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80083f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80084f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80085f: 0x00000000 0x98000000 0x5b16f817 0xd959f281
0x80086f: 0xdb2dce28 0x07029bfc 0x95ce870b 0xac55a062
0x80087f: 0x7ef9dcbb 0x0079be66 0x00000000 0x00000000
0x80088f: 0x00000000 0x00000000 0x00000000 0x00000000
0x80089f: 0x00000000 0x00000000 0x10d4b802 0x47d08ffb
0x8008af: 0x8554199c 0x17b448a6 0x1108a8fd 0xa4fbfc0e
0x8008bf: 0xa3c4655d 0x3ada7726 0x00000048 0x00000000
0x8008cf: 0x00000000 0x00000000 0x00000000 0x00000000
0x8008df: 0x00000000 0x00000000 0x08020000 0x000001ff
0x8008ef: 0x00000000 0x00000000 0x00000000 0x004a0100
the code being executed (or about to execute?) is let mut raw = [0; LEN];
which should place raw
on stack, thus at the end of SRAM.
previous line was also let mut pad = inlawi!(0u512);
which is not doing anything special either, again should be placed on stack, end of SRAM.
what are those 0x056608c4 0x0666084e 0x00d005d8 0x00661000
values doing there?
did I get the growing direction wrong or is it a rust/llvm bug?
@xphoniex based on your suggestion here I was able to reproduce the error using an older version of rust compiler (nightly-2022-07-10
).
Which means that the issue is related to particular versions of Rust compiler, and that the same problem does not occurs in recent versions.
This has been previously anticipated by @saethlin here when he says:
We have some history with AVR miscompilations
Certainly one of the causes of this Rust AVR's compiling quality variation, is the fact that there is not any kind of regression tests being performed by the compiler team's development workflow for this target. Which means that in the proccess of development new broken avr compilers may be produced in future, with no alert (and by different reasons other than related to #109000).
As @hahix, the author of some dependencies you use in your reference code stated: even if regression tests on AVR would have no effect in terms to interrupt the compiler's development workflow. It would be a great start to have a flag indicating that something has been broken.
In this sense, why not close this issue in favor of #107612 ?
See also: This @Patryk27's comment here
Note: Altough I would suggest @workingjubilee to reframe the title of issue #107612 to also consider 8-bits targets since AVR microcontrollers are typically 8-bits. Something like this would be fine
Consider simulation testing for math on 16 and 8-bit targets
.
In this sense, why not close this issue in favor of #107612 ?
There are two problems in this issue, one is a math problem as you put it, and the other is likely a stack overflow.
First one is solved and no longer an issue, but second one is still persisting.
In the re-producer zip, hmac-sha256
crate is used which has some const
values, and I thought that was the reason why the const area of SRAM grows post __do_copy_data
even thought I still consider it a bug that avr-size
does not take that size into account.
But in my previous post, I have demonstrated that even if hmac-sha256
crate is removed, one of these lines:
let mut raw = [0; LEN];
let mut pad = inlawi!(0u512);
is writing values 0x056608c4 0x0666084e 0x00d005d8 0x00661000
to the static area of SRAM, whereas both of these are non-static and should be on regular stack.
If you have an explanation for this we can close this issue?
TLDR It seems your new question departs from the topic raised in the original post. I recomend close this and open a new issue.
If you have an explanation for this we can close this issue?
It seems to me that you probably unintentionally made a topic drift.
In other words, for all ways I look at your original post at this moment, there is not a single one where I can realize that your initial questions were not answered.
If you agree with above statement and you still want to open a new issue, remember to include complete information:
NOTE: github repo enables everyone working on the reproducer to easily reference, link and justify their delta variation until the solution is reached.
My understanding was that inline isn't respected across crate, incorrect?
I mean, I didn't deeply analyze the code structure of that crate, but in the general case it is often very respected cross-crate! As I understand it, it causes the Rust compiler to delay code generation in many cases until the final crate, thus making it much more like "true inlining", by forcing codegen to happen "on-the-spot". This also can cause compilation performance regressions, by forcing repeated needless codegen.
I agree that we seem to be on to a new topic and the current primary issue seems to be resolved, so I will close this. Even if we are not, it is usually better to initially prefer to split up a nebulous issue rather than assuming two problems are intrinsically connected, so if there are any remaining subissues of this that seem unresolved, please open a new issue for them.
Here's video evidence to prove it:
I was able to produce this after a mere 10 second of playing with the code. This is a telltale that there are deeper problems when it comes to memory, caused by the compiler.
I politely ask that you @workingjubilee re-open this issue, and not close it again until it's been fully resolved. All the relevant context, info, and re-producer exists on this issue, so there's no need to re-open another issue and copy/paste the same info there. There is however, a need to not to clutter this space, so that someone with knowledge of llvm/rust can follow along, and maybe eventually help, and resolve this matter.
@saethlin I read your post on Zulip, and I understand inactiveness of this issue is your concern. I've provided evidence in prior posts in gdb, and you can see the gif I've attached as evidence that this issue still exists in compiler. Please let me know if you need anything else.
It's not inactive because it's solved, it's inactive because no one has responded to the issues I raised. I kindly ask you to make your instance clear and ask @workingjubilee to re-open this issue.
This is not a new question, it doesn't depart from original post, and I have clearly stated it in my original post - which is unedited - I'll highlight the relevant parts for you:
not only that, even if I run fns from another crate, even with the working
opt-level = 2
, I get incorrect output:
"not only that" means I have't even told you the most important part yet. This was the centerpiece of my issue, as calling other fns, even small ones like a debug / print, was corrupting the output.
I have posted an example of what I meant by other_crate::do_something()
in my re-producer:
and I have posted the compiler version needed to re-produce this issue, but you seem to have glossed over this info.
if my issue was that I was getting incorrect hmac, I would just set opt
to "2"
and call it a day, use some common sense.
All the relevant context, info, and re-producer exists on this issue, so there's no need to re-open another issue and copy/paste the same info there. There is however, a need to not to clutter this space, so that someone with knowledge of llvm/rust can follow along, and maybe eventually help, and resolve this matter.
I read over this issue from front to back in one sitting and I cannot tell whether the issue you're having is still a stack overflow, a miscompilation, an unsound dependency, or a combination of these. You can always link to this issue from a new more targetted one.
you seem to have glossed over this info.
I have not found this information, it may have been there, but it wasn't obvious to me either. A new less cluttered issue may be easier to follow.
use some common sense.
that's uncalled for and not making people more likely to want to participate in this issue. Consider avoiding such comments in the future.
@xphoniex There are very few people who understand Rust and LLVM well enough to even try to tackle this issue. There are even fewer who have have made commits to rust-lang/rust specifically to fix issues for AVR and MSP430. I am one of them who has, and I do it because I care a lot about Rust's multi-platform viability. If I am asking you to reorganize the data in a way I prefer, I am not bothered if you say you don't understand why I have a given preference, but it is not some plot to dismiss the issue at hand. It is, rather, part of how I sort and think about issues and how to resolve them. Nonetheless, I have nothing against reopening this issue I suppose, but please trust that my request was not some idle whim.
Using rust to compile for Arduino target, I'm see a lot of weird and random behaviors from compiler when making changes to profile and compiler setting. (I'm using avr-hal for uno, here's a sample)
As an example, if I use
hmac-sha256
crate:I get different outputs depending on
opt-level
andlto
, correct one is:which I'm getting from
opt-level=2
with bothlto=true
and unspecified, however:not only that, even if I run fns from another crate, even with the working
opt-level = 2
, I get incorrect output:has a wrong output and instead this works:
before I compile a bug report at gcc, I want to make sure the fault is not at rust side. here's the final command that rustc is running to link the final elf:
as you can see it's using
-O1
which I haven't specified anywhere. (See avr-atmega328p.json)I have tried overriding in my avr-atmega328p.json file inside
pre-link-args
but that doesn't affect anything. Can someone shed some light on what is going wrong here? If it's the-O1
flag, how do I override that?