jart / blink

tiniest x86-64-linux emulator
ISC License
7k stars 225 forks source link

Segfaults in some busybox applets #33

Closed Vogtinator closed 1 year ago

Vogtinator commented 1 year ago

There are easy to reproduce segmentation faults when using a statically linked busybox with glibc: busybox.zip

bc:

> o/blink/blink /usr/bin/busybox-static bc
I2023-01-16T20:47:01.741606:blink/syscall.c:3308:27114 missing syscall 0x14e
I2023-01-16T20:47:01.742877:blink/blink.c:86:27114 SEGMENTATION FAULT (Segmentation fault) AT ADDRESS 500000000570
         PC 4047c9 movups 0x2000(%rsi),%xmm8
         AX 000000000069c990  CX 0000000000000040  DX 00000000000000a8  BX 0000000000000020
         SP 00004fffffffe4c8  BP 000000000069c698  SI 00004fffffffe570  DI 000000000069c9c0
         R8 ffffffffffffffd0  R9 0000000000000000 R10 0000000000000000 R11 0000000000000000
        R12 0000000000000001 R13 0000000000000001 R14 0000000000000000 R15 00004fffffffe540
         FS 000000000069b3c0  GS 0000000000000000 OPS 5276             JIT 0               
        /usr/bin/busybox-static
        00000069c698 0000004047c9 UNKNOWN [STRAY]
        00000069c990 000000000000 UNKNOWN 760 bytes
        00000069c3e0 000000000000 UNKNOWN [STRAY]
        00000000069c 000000000000 UNKNOWN [STRAY] [MISALIGN] [CORRUPT FRAME POINTER]
I2023-01-16T20:47:01.742887:blink/blink.c:67:27114 terminating due to signal SIGSEGV

stat:

> o/blink/blink /usr/bin/busybox-static stat .
I2023-01-16T20:47:05.409497:blink/syscall.c:3308:27139 missing syscall 0x14e
I2023-01-16T20:47:05.411091:blink/blink.c:86:27139 SEGMENTATION FAULT (Segmentation fault) AT ADDRESS 6bd540
         PC 404820 movntdq %xmm4,0x1000(%rdi)
         AX 000000000069c500  CX 0000000000000040  DX 000000000000007a  BX 0000000000636a68
         SP 00004fffffffe608  BP 00004fffffffedfa  SI 0000000000656ae8  DI 00000000006bc540
         R8 ffffffffffffffc0  R9 0000000000000000 R10 fffffffffffffff8 R11 0000000000000000
        R12 00004fffffffe670 R13 00004fffffffe9e8 R14 0000000000000001 R15 0000000000000001
         FS 000000000069b3c0  GS 0000000000000000 OPS 5395             JIT 0               
        /usr/bin/busybox-static
        4fffffffedfa 000000404820 UNKNOWN 2034 bytes
        3d4c4c454853002e 7361622f6e69622f UNKNOWN [MISALIGN] [CORRUPT FRAME POINTER]
I2023-01-16T20:47:05.411101:blink/blink.c:67:27139 terminating due to signal SIGSEGV

sh (tab completion, press tab twice on the prompt):

> o/blink/blink /usr/bin/busybox-static sh
I2023-01-16T20:47:27.800305:blink/syscall.c:3308:27199 missing syscall 0x14e
$ I2023-01-16T20:47:28.355568:blink/blink.c:86:27199 SEGMENTATION FAULT (Segmentation fault) AT ADDRESS 500000000e5f
         PC 4047ad movups 0x1000(%rsi),%xmm4
         AX 00000000006a15c0  CX 0000000000000040  DX 000000000000007a  BX 0000000000000009
         SP 00004fffffffe098  BP 000000000069f330  SI 00004ffffffffe5f  DI 00000000006a1600
         R8 ffffffffffffffc0  R9 0000000000000000 R10 0000000000000000 R11 0000000000000000
        R12 00000000006a0d60 R13 00004ffffffffe1f R14 000000000069f330 R15 0000000000000000
         FS 000000000069b3c0  GS 0000000000000000 OPS 12212            JIT 0               
        /usr/bin/busybox-static
        00000069f330 0000004047ad UNKNOWN [STRAY]
        00000069e790 0002000000f2 UNKNOWN [STRAY]
        ffffffff00000033 4ffffffffe1f UNKNOWN [STRAY] [MISALIGN] [CORRUPT FRAME POINTER]

They happen both on x86_64 Linux as well as in the WASM/emscripten version.

jart commented 1 year ago

Consider using ToyBox compiled with musl-cross-make.

master jart@nightmare:~/blink$ o//blink/blink ./toybox bc
>>> 2 + 2
4
>>> master jart@nightmare:~/blink$
master jart@nightmare:~/blink$ o//blink/blink ./toybox stat .
  File: .
  Size: 4096     Blocks: 8       IO Blocks: 512  directory
Device: 803h/2051d       Inode: 57571047         Links: 10       Device type: 0,0
Access: (0755/drwxr-xr-x)       Uid: ( 1000/    jart)   Gid: ( 1000/    jart)
Access: 2023-01-17 10:22:59.000000000 -0800
Modify: 2023-01-17 10:22:41.000000000 -0800
Change: 2023-01-17 10:22:41.000000000 -0800
master jart@nightmare:~/blink$ PS1='>: ' o//blink/blink ./toybox sh
>: ls
HTAGS  LICENSE  Makefile  README.md  TAGS  blink  build  o  perf.data  test  third_party  tool  toybox
>: master jart@nightmare:~/blink$
master jart@nightmare:~/blink$

ToyBox was created by the guy who made BusyBox, as a noble public service, because he felt guilty about the GPL. So you could think of it as a second generation solution that learns from BusyBox's mistakes. It's real nice and I think Android uses it too. Like Blink and Cosmopolitan, Android doesn't support a lot of the weird kernel features Glibc requires apps depend on.

Prebuilt binary here: toybox.zip

jart commented 1 year ago

For the record, I intend to have Blink support the features that Glibc needs. So I'm going to leave this open until that can happen. It's just going to take more time for that to happen for BusyBox, whereas ToyBox works great today.

Vogtinator commented 1 year ago

Yeah, this is mostly a bug report for incomplete or inaccurate emulation in blink.

Which features are missing for this in particular? I wouldn't expect missing features to causes a SEGV in memcpy.

jart commented 1 year ago

I'm currently investigating it to learn more. The instructions that are faulting are supported and used by Cosmopolitan. Thanks for posting the binary. I'll report back when I learn more.

Vogtinator commented 1 year ago

I had a look at the glibc code and played around in blinkenlights a bit. It looks like variables like __x86_shared_non_temporal_threshold which are based on CPU cache sizes are set to 0 and memcpy doesn't like that because it loops until the size is less than that: https://github.com/bminor/glibc/blob/93967a2a7bbdcedb73e0b246713580c7c84d001e/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S#L745

jart commented 1 year ago

That makes a lot of sense. I'm so happy that Blinkenlights' TUI helped make that easy for you to find. It looks like our CPUID implementation advertises a max leaf of 7 but CPUID 4 isn't implemented (which reveals cache information). I'll start implementing that now.

jart commented 1 year ago

I added the CPUID leafs but it's still having issues. I'm having fun rewinding from the point at which it goes past the end of allocated memory.

image

$ o//blink/blink -m ./busybox-static stat .
I2023-01-17T12:21:38.009117:blink/syscall.c:3309:127090 missing syscall 0x14e
I2023-01-17T12:21:38.011297:blink/throw.c:93:127090 SEGMENTATION FAULT AT ADDRESS 6bd440
         PC 404820 movntdq %xmm4,0x1000(%rdi)
         AX 000000000069c430  CX 0000000000000040  DX 00000000000000aa  BX 0000000000636a68
         SP 00004fffffffedc8  BP 00004ffffffff373  SI 0000000000656ab8  DI 00000000006bc440
         R8 fffffffffffffff0  R9 0000000000000000 R10 fffffffffffffff8 R11 0000000000000000
        R12 00004fffffffee30 R13 00004ffffffff1a8 R14 0000000000000001 R15 0000000000000001
         FS 000000000069b3c0  GS 0000000000000000 OPS 4603             JIT 0
        ./busybox-static
        4ffffffff373 000000404820 UNKNOWN 1451 bytes
        3d4c4c454853002e 7361622f6e69622f UNKNOWN [MISALIGN] [CORRUPT FRAME POINTER]
000000400000-000000400fff  4096 100% r
000000401000-0000005f5fff 2004k 100% rx
0000005f6000-000000684fff  572k 100% r
000000685000-00000069afff   88k  50% rw
00000069b000-0000006bcfff  136k 100% rwx
4fffff800000-4fffffffffff 8192k   1% rw
I2023-01-17T12:21:38.011335:blink/blink.c:67:127090 terminating due to signal SIGSEGV
Segmentation fault
Vogtinator commented 1 year ago

I added the CPUID leafs but it's still having issues. I'm having fun rewinding from the point at which it goes past the end of allocated memory.

Apparently the cache info is vendor specific and glibc doesn't know what to do with GenuineBlink. By using GenuineIntel it works :-/

jart commented 1 year ago

Wow. That's almost as bad as the uname("Blink 4.0") thing I needed to do. I'm going to need to rebuild a lot of Cosmo programs but I'm glad we spotted this sooner rather than later. Thanks for troubleshooting this.

Vogtinator commented 1 year ago

Looks like this issue was reported two weeks ago and is already fixed in glibc git: https://sourceware.org/bugzilla/show_bug.cgi?id=29953