sabotage-linux / sabotage

a radical and experimental distribution based on musl libc and busybox
http://sabo.xyz
Other
546 stars 68 forks source link

gcc473 fails to build, "error allocating 16 bytes..." only on natively booted sabotage #167

Closed Nemykal closed 10 years ago

Nemykal commented 10 years ago

I've been trying to track down the cause of this following elusive bug for the last few days without any luck.

GCC fails to build due to "out of memory allocating 16 bytes after total of xyz bytes" but only from sabotage that is booted, not a chroot into same sabotage install.

Background:

gcc -static -D_GNU_SOURCE   -O0 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-a build/genattrtab \
    build/genattrtab.o build/rtl.o build/read-rtl.o build/ggc-none.o build/vec.o build/min-insn-modes.o build/gensupport.o build/print-rtl.o build/read-md.o bui
build/genattrtab ../.././gcc/config/i386/i386.md \
  insn-conditions.md > tmp-attrtab.c

out of memory allocating 16 bytes after a total of 304340336 bytes
make[2]: *** [s-attrtab] Error 1
make[2]: Leaving directory `/src/build/gcc473/gcc-4.7.3/host-x86_64-unknown-linux-gnu/gcc'
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory `

This is 100% reproducible, it happens every time. The two "byte" values are different though always around the same respective sizes. The first one is nearly always 16 bytes, though I have seen it say 1752 bytes once. I have tried fiddling with the CFLAGS set in /src/pkg/gcc473 but this makes no impact, so I left it at -O0.

The extremely curious thing about this is that: if I boot a xubuntu 13.10 amd64 live cd, and mount and chroot into that sabotage install, IT WORKS - I can compile gcc473 without issues.

I thought that perhaps this was a resource limits problem, so I raised all the ulimit settings I could. I am not sure if I did this right, but ulimit -a shows the following in all new shells and also after a reboot. I just added several ulimit commands in my shell profile and also in rc.local to raise these limtis, but this didn't make any difference.

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 125938
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 102400
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) unlimited
real-time priority              (-r) unlimited
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I went hunting in the above-mentioned directory, /src/build/gcc473/gcc-4.7.3/host-x86_64-unknown-linux-gnu/gcc, to see if I could replicate the problem manually (so I didnt have to wait for butch rebuild gcc473 each time, the error only occurs after about 30 seconds of compiling)

cd /src/build/gcc473/gcc-4.7.3/host-x86_64-unknown-linux-gnu/gcc

# build/genattrtab ../.././gcc/config/i386/i386.md insn-conditions.md > tmp-attrtab.c

out of memory allocating 16 bytes after a total of 284921200 bytes

I straced this and it didnt enlighten me much on what it was doing - the only odd thing I noticed was that it was doing a ton of brk() syscalls:

...
brk(0x112b6000)                         = 0x112b6000
brk(0x112b7000)                         = 0x112b7000
brk(0x112b8000)                         = 0x112b8000
brk(0x112b9000)                         = 0x112b9000
brk(0x112ba000)                         = 0x112ba000
brk(0x112bb000)                         = 0x112bb000
brk(0x112bc000)                         = 0x112bc000
brk(0x112bd000)                         = 0x112bd000
brk(0x112be000)                         = 0x112be000
brk(0x112bf000)                         = 0x112bf000
brk(0x112c0000)                         = 0x112c0000
brk(0x112c1000)                         = 0x112c0000
brk(0)                                  = 0x112c0000
writev(2, [{"\nout of memory allocating 16 byt"..., 68}, {NULL, 0}], 2) = 68
exit_group(1)                           = ?
+++ exited with 1 +++
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.09    0.047152           1     65527           brk
  1.59    0.000766           1      1439           readv
  0.32    0.000152           1       211           writev
  0.00    0.000000           0        15           close
  0.00    0.000000           0        15           open
  0.00    0.000000           0         3           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.048070                 67213           total

So I then added '-ggdb' to CFLAGS for /src/pkg/gcc473 and ran this 'build/genattrtab' through gdb 7.6.2:

# gdb build/genattrtab
(gdb) run ../.././gcc/config/i386/i386.md   insn-conditions.md

...
out of memory allocating 16 bytes after a total of 273575280 bytes
[Inferior 1 (process 17361) exited with code 01]

Breakpoint 3, 0x0000000000419685 in exit ()
(gdb) backtrace
#0  0x0000000000419685 in exit ()
#1  0x000000000041955c in xexit (code=1) at ../.././libiberty/xexit.c:51
#2  0x0000000000419371 in xmalloc_failed (size=16) at ../.././libiberty/xmalloc.c:137
#3  0x00000000004193af in xmalloc (size=16) at ../.././libiberty/xmalloc.c:149
#4  0x000000000040f2a9 in ggc_internal_alloc_stat (size=16) at ../.././gcc/ggc-none.c:54
#5  0x000000000040bd67 in ggc_internal_zone_alloc_stat (z=0x638150 <rtl_zone>, s=16) at ../.././gcc/ggc.h:311
#6  0x000000000040bd8c in ggc_alloc_zone_rtx_def_stat (z=0x638150 <rtl_zone>, s=16) at ../.././gcc/ggc.h:335
#7  0x000000000040bea5 in rtx_alloc_stat (code=ATTR) at ../.././gcc/rtl.c:197
#8  0x000000000040172c in attr_copy_rtx (orig=0x63d440) at ../.././gcc/genattrtab.c:671
#9  0x0000000000401847 in attr_copy_rtx (orig=0x63d400) at ../.././gcc/genattrtab.c:686
#10 0x0000000000401847 in attr_copy_rtx (orig=0x63d3a0) at ../.././gcc/genattrtab.c:686
#11 0x0000000000406cd7 in optimize_attrs () at ../.././gcc/genattrtab.c:2880
#12 0x000000000040bc5e in main (argc=3, argv=0x7fffffffeb98) at ../.././gcc/genattrtab.c:5025

Very weird!

I even tried this with an identical kernel config as the one xubuntu used - same issue. I had thought maybe it was something to do with transparent hugepages so I tried it set to always and also set to never - same issue.

For me this bug is 100% reproducible. I can try to include any information you require... I'm out of ideas trying to debug this

env on sabotage native, where it doesnt work:

MANPATH=/share/man
TERM=screen
SHELL=/sbin/bash
SSH_CLIENT=10.108.62.74 28675 22
MENUCONFIG_COLOR=blackbg
SSH_TTY=/dev/pts/0
USER=root
TMUX=/tmp/tmux-0/default,381,0
PATH=/local/bin:/bin
MAIL=/var/mail/root
PWD=/src/build/gcc473/gcc-4.7.3/host-x86_64-unknown-linux-gnu/gcc
TZ=UTC
TMUX_PANE=%0
SHLVL=2
HOME=/root
LOGNAME=root
SSH_CONNECTION=10.108.62.74 28675 10.108.62.215 22
_=/bin/env
OLDPWD=/root

env on xubuntu boot, chroot into sabotage install, where it does work:

XDG_SESSION_ID=1
SHELL=/bin/bash
TERM=rxvt
SSH_CLIENT=10.108.62.74 28679 22
MENUCONFIG_COLOR=blackbg
SSH_TTY=/dev/pts/4
USER=root
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
MAIL=/var/mail/root
PWD=/
LANG=en_US.UTF-8
HOME=/root
SHLVL=2
LOGNAME=root
LC_CTYPE=en_US.UTF-8
SSH_CONNECTION=10.108.62.74 28679 10.108.62.215 22
LESSOPEN=| /usr/bin/lesspipe %s
XDG_RUNTIME_DIR=/run/user/0
LESSCLOSE=/usr/bin/lesspipe %s %s
_=/usr/sbin/env

I just booted xubuntu and went to /src/build/gcc473/gcc-4.7.3/host-x86_64-unknown-linux-gnu/gcc without cleaning it, and ran that very same binary (build/genattrtab) - and it worked as it should, no memory allocation error. The error only manifests when trying run that same binary under sabotage. Bizarre!

Misc: I also noted that 'ulimit -a' on xubuntu was the same default limits as originally on sabotage too... so adjusting those didnt have any impact at all

I did some further testing and chrooted into it from a gentoo amd64 minimal install iso too... Same thing - it works from a chroot, but not if I boot sabotage normally!

Any ideas?

Nemykal commented 10 years ago

OK, thanks to the help in IRC, I tried this again with the sabotage kernel (3.12.6-grsec), and it worked.

I'm going to try to figure out what the offending config option in my kernel was (or combination thereof)... it seems really really weird that only gcc fails to build.

rofl0r commented 10 years ago

the issue we face here is that for some reason, the kernel gives us a very narrow heap mapping of just about 200MB. musl's allocator uses only the heap for small allocations, and apparently the gen-attr-tab program of gcc has a usage pattern that does a lot of such small allocations, so in the end we run out of heap space. it's clearly a bug of recent kernels, especially when ASLR is enabled and PIE binaries used[0]. for some reason your custom kernel is affected by this, while the sabotage kernel isn't, or at least not that much. it would be interesting to see a dump of /proc/pid/maps for the gen-attr-tabs process on that box (maybe by manually invoking it and then sending it a SIGSTP from the console, using CTRL-Z), or at least from some other process that makes use of malloc() and free().

[0] https://lkml.org/lkml/2013/10/2/43

Nemykal commented 10 years ago

I bisected the changes from the known-good kernel (sabotage's, without grsecurity patch) and the known-bad kernel (my original config)

My findings are as follows: CONFIG_MEM_SOFT_DIRTY is to blame.

If I set CONFIG_MEM_SOFT_DIRTY=y, then gcc454, gcc464, gcc473 (all the gccs I tested) do not compile, with the above error. If I recompile the kernel with CONFIG_MEM_SOFT_DIRTY=n, everything builds fine.

I've gotten this bug to be 100% reproducible now, so please let me know if there's any additional information you need.

I will try this setting CONFIG_MEM_SOFT_DIRTY=y with grsec enabled now and see if it still breaks.

Nemykal commented 10 years ago

Here is the output of gdb "info proc mappings" after the process dies on the faulty kernel.

https://dpaste.de/EcVs/raw

rofl0r commented 10 years ago

thanks for your research. my first assessment was incorrect, apparently the problem is that CONFIG_MEM_SOFT_DIRTY uses a new vma for each new heap page rather than extending the existing one and thus hitting the kernels vma limit. this is probably a bug, so reporting it to the kernel ML makes sense.