Open Gelbpunkt opened 1 year ago
An "out of memory" error means malloc failed. On Linux with overcommit enabled, malloc can't really fail unless you try to allocate more memory than the system has in one allocation. So probably a miscompile or something like that, nothing to do with the amount of memory your system has.
alternatively, because musl is different, it could be that its malloc does fail if there is some general lack of memory. :)
This is a very weird issue that I can consistently reproduce in an Alpine Linux environment, for example with a container.
On Alpine Linux, stage 3 builds will always error after reporting "Out of memory" like so:
The issue is that the system is not out of memory at all. I'm building on a system with an AMD Epyc 7402P (24c/48t) and 256GB of RAM. I wrote a very simple Python script to ensure that this is NOT an OOM and my eyes looking at
top
did not fool me:All it does is it collects memory usage information every 0.1s and keeps track of the extreme values.
At the end of the build, it shows:
So there is enough memory for LLVM!
Here's my container setup:
The flags ensure there are zero limitations to CPU and memory usage in the container. I can build AOSP just fine with the same flags, so the allocation failure is definitely not due to the container setup.
These commands reproduce the error in this container:
I've been using 14 link jobs to try to circumvent this, but it hasn't changed anything compared to 48.
I am guessing that this is somehow related to musl. I vaguely remember that it has a lower thread stack size than glibc, would it be possible that this is the reason for the error? Adding
-DCMAKE_EXE_LINKER_FLAGS="-Wl,-z,stack-size=2097152"
doesn't help, but maybe I misunderstand how I would raise the stack size.