tactcomplabs / rev

RISC-V SST CPU Component
Other
17 stars 21 forks source link

Stack buffer initialization corrupts memory. #285

Closed ldalessa closed 3 months ago

ldalessa commented 4 months ago

Describe the bug

When I allocate and initialize a small char buffer on the stack it seems to be corrupting memory, in particular the argv state. My understanding is that _STACK_SIZE_ is 1MB so I don't think that's the problem.

To Reproduce

SST stuff

$ sst --version
SST-Core Version (-dev, git branch : devel, SHA: cb2e3206a8af52702b23cd15c81a6ea0a6b3cfa4)
$ sst-config --CXX
mpic++ -std=c++17
$ sst-config --ELEMENT_CXXFLAGS 
 -std=c++17 -fPIC -DHAVE_CONFIG_H -I/home/ldalessa/local/include

REV stuff (configured with -DCMAKE_CXX_COMPILER=mpic++ -DCMAKE_BUILD_TYPE=Release).

$ mpic++ --version
g++ (Debian 13.2.0-25) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ git status
On branch devel
Your branch is up to date with 'origin/devel'.

nothing to commit, working tree clean
$ git show HEAD
commit 8292c73900119412eb63b1d1a8f059e7c0efa2d0 (HEAD -> devel, origin/devel)
Merge: 27249d0 5ffaac7
Author: Ryan Kabrick <31963805+rkabrick@users.noreply.github.com>
Date:   Thu May 30 16:31:03 2024 -0400

    Merge pull request #266 from tactcomplabs/mem-dump

    Adding memory dumping

Test program. volatile to prevent the loop from being removed, noipa to prevent the call to foo from being removed, I split it into a function foo to make it easier for me to look at the asm, but that doesn't impact the bug as long as the loop isn't removed by the dead-code passes.

extern "C"
[[gnu::noipa]]                                                                                                                                                                                                                                  
void foo() {                                                                                                                                                                                                                                  
    volatile char data[256];                                                                                                                                                                                                                  
    for (int i = 0; i < 256; ++i) {                                                                                                                                                                                                           
        data[i] = '\0';                                                                                                                                                                                                                       
    }                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                             

auto main(int argc, char const* argv[]) -> int                                                                                                                                                                                                
{                                                                                                                                                                                                                                             
    foo();                                                                                                                                                                                                                                    
    return argv[0][0];                                                                                                                                                                                                                        
}    

C++ compiler stuff, aggressive flags to minimize produced asm (flags don't matter).

$ $RVCXX --version
riscv64-unknown-linux-musl-g++ (crosstool-NG 1.26.0.85_06fad54) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ riscv64-unknown-linux-musl-g++ -O3  -flto=auto -march=rv64imafdc -static -o test test.cpp

Output asm

$ riscv64-unknown-linux-musl-objdump -C --disassemble=main test 

wmd:     file format elf64-littleriscv

Disassembly of section .text:

0000000000010172 <main>:
   10172:   1141                    addi    sp,sp,-16
   10174:   e022                    sd  s0,0(sp)
   10176:   e406                    sd  ra,8(sp)
   10178:   842e                    mv  s0,a1
   1017a:   0da000ef            jal 10254 <foo>
   1017e:   601c                    ld  a5,0(s0)
   10180:   60a2                    ld  ra,8(sp)
   10182:   6402                    ld  s0,0(sp)
   10184:   0007c503            lbu a0,0(a5)
   10188:   0141                    addi    sp,sp,16
   1018a:   8082                    ret
$ riscv64-unknown-linux-musl-objdump -C --disassemble=foo test

wmd:     file format elf64-littleriscv

Disassembly of section .text:

0000000000010254 <foo>:
   10254:   7111                    addi    sp,sp,-256
   10256:   4781                    li  a5,0
   10258:   10000693            li  a3,256
   1025c:   00278733            add a4,a5,sp
   10260:   00070023            sb  zero,0(a4)
   10264:   2785                    addiw   a5,a5,1
   10266:   fed79be3            bne a5,a3,1025c <foo+0x8>
   1026a:   6111                    addi    sp,sp,256
   1026c:   8082                    ret

I'm using the rev-model-options-config.py provided in rev/test.

$ sst --add-lib-path=/home/ldalessa/rev/build/release/src /home/ldalessa/rev/test/rev-model-options-config.py -- --machine "[CORES:RV64IMAFDC]" --program test --enableMemH=0

Additional context

I have tried rev's Debug, RelWithDebInfo, Release, and default builds and this doesn't matter. I have tried building Rev with just g++ instead of through the mpic++ wrapper and that doesn't matter. The size of the buffer does matter, if I drop down into the 190s it won't fail, depending on the build that I'm using, but 256 bytes consistently failed.

leekillough commented 4 months ago

I have reduced the test case by making data as small as necessary, and eliminated the loop:

[[gnu::noipa]] void foo() {
  volatile char data[81];
  data[2] = '\0';
}

int main( int argc, char** argv ) {
  foo();
  return argv[0][0];
}

The assignment to data[2] apparently corrupts argv[0], making the return argv[0][0] dereference a nullptr.

I looked at the assembly code, and it looks mostly correct, although it only allocates 80 bytes for data instead of 81, although that shouldn't affect the data[2] assignment, and might be making sure sp is aligned to 16 bytes:

00000000000101b6 <foo()>:
   101b6:       715d                    add     sp,sp,-80
   101b8:       00010123                sb      zero,2(sp)
   101bc:       6161                    add     sp,sp,80
   101be:       8082                    ret

0000000000010106 <main>:
   10106:       1141                    add     sp,sp,-16
   10108:       e022                    sd      s0,0(sp)
   1010a:       e406                    sd      ra,8(sp)
   1010c:       842e                    mv      s0,a1
   1010e:       0a8000ef                jal     101b6 <foo()>
   10112:       601c                    ld      a5,0(s0)
   10114:       60a2                    ld      ra,8(sp)
   10116:       6402                    ld      s0,0(sp)
   10118:       0007c503                lbu     a0,0(a5)
   1011c:       0141                    add     sp,sp,16
   1011e:       8082                    ret

When I change the declaration to data[82], the compiler completely changes the stack allocation of data to the next 16 byte stack allocation, and makes the assignment of data[2] look strange:

[[gnu::noipa]] void foo() {
  volatile char data[82];
  data[2] = '\0';
}

int main( int argc, char** argv ) {
  foo();
  return argv[0][0];
}
00000000000101b6 <foo()>:
   101b6:       711d                    add     sp,sp,-96
   101b8:       00010523                sb      zero,10(sp)
   101bc:       6125                    add     sp,sp,96
   101be:       8082                    ret
0000000000010106 <main>:
   10106:       1141                    add     sp,sp,-16
   10108:       e022                    sd      s0,0(sp)
   1010a:       e406                    sd      ra,8(sp)
   1010c:       842e                    mv      s0,a1
   1010e:       0a8000ef                jal     101b6 <foo()>
   10112:       601c                    ld      a5,0(s0)
   10114:       60a2                    ld      ra,8(sp)
   10116:       6402                    ld      s0,0(sp)
   10118:       0007c503                lbu     a0,0(a5)
   1011c:       0141                    add     sp,sp,16
   1011e:       8082                    ret

I will try to find the physical address of argv[0] in the simulator, and set up watchpoints to catch its corruption.

leekillough commented 4 months ago

It looks like an off-by-1 allocation bug, since char data[81] is allocated only 80 bytes, while char data[82] jumps to 96 bytes. But in our test case, we don't use data[80] or data[81], so the compiler is free to generate code which produces the same results (the "as if" rule).

ldalessa commented 4 months ago

It looks like an off-by-1 allocation bug, since char data[81] is allocated only 80 bytes, while char data[82] jumps to 96 bytes. But in our test case, we don't use data[80] or data[81], so the compiler is free to generate code which produces the same results (the "as if" rule).

This does seem weird. I don't know why gcc rounds to 80 instead of 16, like clang does. My original code seems to allocate the 256 fine, and the crosstool-ng musl toolchain I'm using doesn't produce the off-by-one for 81. I can't reproduce that at godbolt either (clang is smart enough to know it doesn't need the full 96 bytes).

Either way, that sb shouldn't corrupt anything right, as it's well within the stack allocation (even if the stack allocation has been compacted because we're not accessing parts of it)?

leekillough commented 4 months ago

Either way, that sb shouldn't corrupt anything right, as it's well within the stack allocation (even if the stack allocation has been compacted because we're not accessing parts of it)?

Yes. It is corrupting argv[0] in main(). I'm going through Rev trying to trace how and why it occurs.

leekillough commented 4 months ago

Please verify this has been fixed (see #288 ) and close it.

ldalessa commented 4 months ago

Looks like it's fixed my issues. Thank you.

ldalessa commented 4 months ago

I'm still having some problems with stack corruption. I've tried to reduce the test case and come up with the following.

#include <cstdio> 

int main(int, char** argv)
{
    char a[512]{};
    int n = snprintf(a, sizeof(a), "Processing records from %s\n", argv[1]);
    if (argv[1] == nullptr) {
        asm volatile( ".dword 0x00000000" ::: "memory" );
    }
    return 0;
}

Compiled this time as:

$ riscv64-unknown-linux-musl-gcc -march=rv64imafdc -o test test.cpp

Oddly, if I do not store the unused n on the stack, there is no issue, if I leave the n there I fail the assert.

This one uses snprintf from musl, but I do not believe that this is causing the issue directly. If you can't reproduce this let me know and I can try and install a different toolchain.

$ riscv64-unknown-linux-musl-gcc --version
riscv64-unknown-linux-musl-gcc (crosstool-NG 1.26.0.85_06fad54) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ldalessa commented 4 months ago

Wait wait. I wasn't using -static. I still am having problems but this test case doesn't exhibit it with -static. I'll close for now and reopen once I reduce something.

ldalessa commented 4 months ago

Ugh, no... sorry for the churn. -static does not resolve it, the test is still failing.

leekillough commented 4 months ago

There might be two distinct issues at work. My reduced case and another unrelated argv corruption bug were fixed by correcting the allocation of argv. That wasn't caused by corruption, but by the misallocation of argv to use the same space as other variables.

This may be running into an unrelated bug which might be corruption.

ldalessa commented 4 months ago

Okay. It wasn’t manifesting when I was using a static char buffer and it was manifesting with the nulled argv[] pointer so I assumed it was related. I guess if those are allocated near the stack then any memory corruption could look like this.

leekillough commented 4 months ago

https://github.com/tactcomplabs/rev/pull/290

leekillough commented 3 months ago

Does it still fail after https://github.com/tactcomplabs/rev/pull/290 has been merged into devel?

ldalessa commented 3 months ago

Does it still fail after #290 has been merged into devel?

Hi Lee. This failure is still present. That #290 is referencing #289 and seems to be targeting it.

leekillough commented 3 months ago

Does it still fail after #290 has been merged into devel?

Hi Lee. This failure is still present. That #290 is referencing #289 and seems to be targeting it.

Yes, but it fixed a lot of corruption bugs.

leekillough commented 3 months ago

I have reduced it some more, switching to C (smaller executable):

#include <stdio.h>
int main(int argc, char** argv)
{
  char a[365];
  int n = snprintf(a, sizeof(a), "");
  if (!argv[1])
    asm(".word 0");
}

The 365 size of a is the size which triggers it. 364 does not. a does not need to be initialized. The snprintf is apparently necessary, but it can be an empty print. I tried other functions like puts, atoi and they did not trigger it. The variable n is necessary.

(I am providing a single argument with --args="[1]".)

leekillough commented 3 months ago

The stack pointer was not always aligned on 16 bytes. Try it now.

ldalessa commented 3 months ago

The stack pointer was not always aligned on 16 bytes. Try it now.

This seems to have resolved the test you posted, but if I bump that buffer size back up to 512 (using your C code example) it's still corrupting argv and failing the assert.

I'm using musl, I'm not sure if that's making any difference.

leekillough commented 3 months ago

Increasing it 16 bytes to 381 makes it fail. I am stepping through the simulator, trying to figure out where corruption is occurring.

leekillough commented 3 months ago

The corruption occurs:

   104ac:       f482                    sd      zero,104(sp)

when sp + 104 == 0x3ffffb33, and &argv[1] == 0x3ffffb38.

It is in _svfprintf_r, which snprintf calls.

sp has the address 0x3ffffacb, which is not aligned. When main() is entered, sp == 0x3ffffffb, which is not aligned.

leekillough commented 3 months ago

It should be fixed now.

ldalessa commented 3 months ago

Works both for my reduced test case and RL code now. Thanks Lee.