riscv-software-src / riscv-pk

RISC-V Proxy Kernel
Other
570 stars 304 forks source link

pk: fix __do_brk when new addr is not feasible #295

Closed xukl closed 1 year ago

xukl commented 1 year ago

Linux kernel simply returns current brk when request brk addr is not feasible. The pk should probably do the same.

Programs like sbrk in glibc expect syscall(SYS_brk, NULL) to return the current brk value. Mysterious bugs occur if pk munmaps all allocated brk memory and return brk_min.

xukl commented 1 year ago

For your information: I got a strange bug when running a program on spike/pk.

// test.c
#include <stdlib.h>

int X[32000][12];

int main(int argc, char *argv[]){
 if (argc < 2)
  return 0;
 int digits = atoi(argv[1]);

 return 0;
}
$ riscv64-linux-gnu-gcc -static test.c -o exe
$ spike pk ./exe 0
bbl loader
z  0000000000000000 ra 00000000000141ca sp 0000003ffffff8f0 gp 0000000000075bc0
tp 00000000001f2020 t0 0000000000000001 t1 2f2f2f2f2f2f2f2f t2 0000000000000000
s0 0000003ffffff980 s1 0000000000000003 a0 0000003ffffffbd0 a1 0000000000000000
a2 000000000000000a a3 0000000000000000 a4 0000000000000000 a5 00000000001f2028
a6 00000000001ecdc8 a7 2f4a574a016a024c s2 0000003ffffffb28 s3 0000000000000001
s4 0000003ffffffb48 s5 0000000000000001 s6 0000000000010636 s7 0000000000010420
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 ffffffffffffffff t4 0000000000073f80 t5 000000000006ee1c t6 0000000000010000
pc 0000000000014dae va/inst 0000000000000008 sr 8000000200006020
User load segfault @ 0x0000000000000008

Test environment: riscv64-linux-gnu-glibc 2.36, riscv64-linux-gnu-gcc 12.2.0, pk master (3ed18cf).

In short, the bug is caused by running glibc 2.36+ on pk. One commit of glibc makes TLS (thread local storage) block allocation use a routine different from plain sbrk: if ___curbrk == NULL then use raw brk syscall (If brk has not been invoked, there is no need to update __curbrk. The first call to brk will take care of that.). This can be confirmed by strace a program with glib 2.36+ on a real Linux, which will show brk(NULL) called by more than once. On real Linux, calling syscall(SYS_brk, NULL) after syscall(SYS_brk, syscall(SYS_brk, NULL) + some_positive_value) is harmless. However, on pk, the previous allocated memory gets munmaped, which result in the loss of TLS information. So the full picture is that:

  1. glibc _dl_early_allocate allocate TLS storage, but not writing ___curbrk
  2. TLS is filled with useful information from ELF TLS section
  3. some other glibc function uses malloc, which goes in sbrk eventually
  4. sbrk finds ___curbrk == NULL, yielding a syscall(SYS_brk, NULL)
  5. pk munmap TLS block, all TLS information is lost
  6. sbrk calls brk(oldbrk + increment). VMA of TLS is back, but all data is filled with 0
  7. When atoi calls internal strtol_l_internal with argument locale_t loc, it reads loc from the 0-filled TLS block, so that loc is NULL
  8. strtol_l_internal reads loc->__locales[LC_NUMBERS], which is *((void*)loc + 0x8) i.e. *0x8, resulting in the "User load segfault @ 0x0000000000000008"

So I believe this fix to __do_brk is necessary.