Closed s117 closed 4 years ago
The core trace shows before this error happens, the glibc served a malloc request with mmap().
For a mmap() request without MAP_FIXED
flag, PK will use the first unmapped continuous VM region after the current "brk" to serve it.
After that, if the userspace program attempts to expand its heap using brk(), and if the new "brk" goes into the region that was previously mapped by mmap() call, the PK will not serve it [1] [2]. And that's exactly why this error happened.
Heres three pictures to help demostrate the concept:
One solution (https://github.com/s117/riscv-glibc/commit/c484da610d8d97ea8b8e14a9e35838bd68b7b5b8) to this problem is to disable the use of mmap() in glibc's memory allocator, as documented at https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html
There're reasons for serving large memory allocation with mmap(). AFAIK, disabling it in a real OS results in at least:
Less efficient resource sharing (the unused memory cannot be returned to OS immediately).
Increased memory fragmentation issue (available VM space can be scattered in the brk() maintained heap).
But here, since PK is a single program execution environment, we don’t really care about the first drawback. And we can alleviate the second by increasing the total available physical memory spike -m<larger mem>
(yeah, I know, virtual space resources are less a problem in 64bit machine, but don't forget PK's VM logic can only do a direct map from the virtual page to physical page, e.g. PPN=VPN, so with PK you don't have the entire virtual space available).
Interestingly, disabling the use of mmap() in glibc's memory allocator also solved the error in 456.hmmer_ref (Misaligned store @ 000000000001b8f0
), and 450.soplex (User load segfault @ 0x0000000000736010 PC=000000000005e25c
).
A note for parameter M_MMAP_MAX
https://github.com/s117/riscv-glibc/blob/06983fe52cfe8e4779035c27e8cc5d2caab31531/malloc/malloc.c
Definition:
Assignment:
Use:
Commit d0401da incorporated this patch.
SPEC2006 403.gcc_ref failed with an error prompt "Cannot allocate 131072 bytes".
I gave the simulator 16GB physical memory, so it should be a memory allocator issue rather than the real out of memory.