Closed xukl closed 1 year ago
For your information: I got a strange bug when running a program on spike/pk.
// test.c
#include <stdlib.h>
int X[32000][12];
int main(int argc, char *argv[]){
if (argc < 2)
return 0;
int digits = atoi(argv[1]);
return 0;
}
$ riscv64-linux-gnu-gcc -static test.c -o exe
$ spike pk ./exe 0
bbl loader
z 0000000000000000 ra 00000000000141ca sp 0000003ffffff8f0 gp 0000000000075bc0
tp 00000000001f2020 t0 0000000000000001 t1 2f2f2f2f2f2f2f2f t2 0000000000000000
s0 0000003ffffff980 s1 0000000000000003 a0 0000003ffffffbd0 a1 0000000000000000
a2 000000000000000a a3 0000000000000000 a4 0000000000000000 a5 00000000001f2028
a6 00000000001ecdc8 a7 2f4a574a016a024c s2 0000003ffffffb28 s3 0000000000000001
s4 0000003ffffffb48 s5 0000000000000001 s6 0000000000010636 s7 0000000000010420
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 ffffffffffffffff t4 0000000000073f80 t5 000000000006ee1c t6 0000000000010000
pc 0000000000014dae va/inst 0000000000000008 sr 8000000200006020
User load segfault @ 0x0000000000000008
Test environment: riscv64-linux-gnu-glibc 2.36, riscv64-linux-gnu-gcc 12.2.0, pk master (3ed18cf).
In short, the bug is caused by running glibc 2.36+ on pk. One commit of glibc makes TLS (thread local storage) block allocation use a routine different from plain sbrk
: if ___curbrk == NULL
then use raw brk syscall (If brk has not been invoked, there is no need to update __curbrk. The first call to brk will take care of that.
). This can be confirmed by strace a program with glib 2.36+ on a real Linux, which will show brk(NULL)
called by more than once. On real Linux, calling syscall(SYS_brk, NULL)
after syscall(SYS_brk, syscall(SYS_brk, NULL) + some_positive_value)
is harmless. However, on pk, the previous allocated memory gets munmap
ed, which result in the loss of TLS information. So the full picture is that:
_dl_early_allocate
allocate TLS storage, but not writing ___curbrk
malloc
, which goes in sbrk
eventuallysbrk
finds ___curbrk == NULL
, yielding a syscall(SYS_brk, NULL)
munmap
TLS block, all TLS information is lostsbrk
calls brk(oldbrk + increment)
. VMA of TLS is back, but all data is filled with 0atoi
calls internal strtol_l_internal
with argument locale_t loc
, it reads loc
from the 0-filled TLS block, so that loc
is NULLstrtol_l_internal
reads loc->__locales[LC_NUMBERS]
, which is *((void*)loc + 0x8)
i.e. *0x8
, resulting in the "User load segfault @ 0x0000000000000008"So I believe this fix to __do_brk
is necessary.
Linux kernel simply returns current brk when request brk addr is not feasible. The pk should probably do the same.
Programs like
sbrk
in glibc expectsyscall(SYS_brk, NULL)
to return the current brk value. Mysterious bugs occur if pkmunmap
s all allocated brk memory and returnbrk_min
.