metalbear-co / mirrord

Connect your local process and your cloud environment, and run local code in cloud conditions.
https://mirrord.dev
MIT License
3.76k stars 103 forks source link

go1.22.0 crash when executing a command (Linux) #2653

Open Razz4780 opened 1 month ago

Razz4780 commented 1 month ago

Bug Description

Go binary fails panics on executing command

Steps to Reproduce

Build the snippet below using go 1.22.0, run with mirrord exec

package main

import (
    "fmt"
    "io"
    _ "net" // forces dynamic linking
    "os/exec"
)

func main() {
    cmd := exec.Command("echo", "whatever")

    stdin, err := cmd.StdinPipe()
    if err != nil {
        panic(err)
    }
    defer stdin.Close()

    stdout, err := cmd.StdoutPipe()
    defer stdout.Close()
    if err != nil {
        panic(err)
    }
    // Closed later

    // Start the command
    if err := cmd.Start(); err != nil {
        panic(err)
    }

    buf, err := io.ReadAll(stdout)
    if err != nil {
        panic(err)
    }

    fmt.Println("command done")
    fmt.Print(string(buf))
}

Backtrace

⠒ mirrord exec
    ✓ running on latest (3.113.0)!
    ✓ ready to launch process
      ✓ layer extracted
      ✓ using operator
        ✓ operator version compatible
        ✓ operator license valid
        ✓ user credentials prepared
        ✓ session started
          ✓ target fetched
          ✓ connected to the target
    ✓ config summary                                                                                                                                                                                        runtime: morestack on g0, stack [0x7ffe4f5fa000 0x7ffe4fdf7710], sp=0xc0000a5ab0, called from
syscall.releaseForkLock()
        /home/razz4780/sdk/go1.22.0/src/syscall/forkpipe2.go:84 +0xd6 fp=0xc0000a5ab8 sp=0xc0000a5ab0 pc=0x481a36
syscall.forkExec({0xc000014150?, 0x45021e?}, {0xc0000b0020, 0x2, 0x2}, 0xc0000a5c40?)
        /home/razz4780/sdk/go1.22.0/src/syscall/exec_unix.go:215 +0x366 fp=0xc0000a5bd0 sp=0xc0000a5ab8 pc=0x481466
syscall.StartProcess(...)
        /home/razz4780/sdk/go1.22.0/src/syscall/exec_unix.go:254
os.startProcess({0xc000014150, 0xd}, {0xc0000b0020, 0x2, 0x2}, 0xc0000a5dd0)
        /home/razz4780/sdk/go1.22.0/src/os/exec_posix.go:54 +0x312 fp=0xc0000a5cb0 sp=0xc0000a5bd0 pc=0x492dd2
os.StartProcess({0xc000014150, 0xd}, {0xc0000b0020, 0x2, 0x2}, 0xc0000a5dd0)
        /home/razz4780/sdk/go1.22.0/src/os/exec.go:111 +0x54 fp=0xc0000a5cf0 sp=0xc0000a5cb0 pc=0x4929f4
os/exec.(*Cmd).Start(0xc0000d8000)
        /home/razz4780/sdk/go1.22.0/src/os/exec/exec.go:700 +0x5f1 fp=0xc0000a5e78 sp=0xc0000a5cf0 pc=0x4a7251
main.main()
        /home/razz4780/MetalBear/random-shit/hello/hello.go:27 +0x14b fp=0xc0000a5f50 sp=0xc0000a5e78 pc=0x4a95eb
runtime.main()
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:271 +0x29d fp=0xc0000a5fe0 sp=0xc0000a5f50 pc=0x43991d
runtime.goexit({})
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a5fe8 sp=0xc0000a5fe0 pc=0x469881

fatal error: morestack on g0

runtime stack:
runtime.throw({0x4d0b26?, 0x0?})
        /home/razz4780/sdk/go1.22.0/src/runtime/panic.go:1023 +0x5c fp=0xc000035f88 sp=0xc000035f58 pc=0x436e5c
runtime.badmorestackg0.func1()
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:533 +0xe5 fp=0xc000035fe0 sp=0xc000035f88 pc=0x43a5a5
runtime.switchToCrashStack0()
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:559 +0x32 fp=0xc000035ff0 sp=0xc000035fe0 pc=0x467952

goroutine 1 gp=0xc0000061c0 m=0 mp=0x5866a0 [running]:
runtime: g 1 gp=0xc0000061c0: unknown pc 0x71ee79c4d340
stack: frame={sp:0xc0000a56a8, fp:0x0} stack=[0xc0000a5000,0xc0000a6000)
0x000000c0000a55a8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55b8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55c8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55d8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55e8:  0x0000000000000000  0x000000000043498a <runtime.sysSigaction+0x000000000000002a> 
0x000000c0000a55f8:  0x0000000000000040  0x000000c0000a56b0 
0x000000c0000a5608:  0x0000000000000000  0x0000000000000008 
0x000000c0000a5618:  0x0000000000000000  0x000000c0000a5688 
0x000000c0000a5628:  0x0000000000406165 <runtime.sigaction+0x0000000000000145>  0x0000000000000040 
0x000000c0000a5638:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5648:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5658:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5668:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5678:  0x0000000000000000  0x000000c0000a5708 
0x000000c0000a5688:  0x000000c0000a56d0  0x00000000004348ee <runtime.setsig+0x000000000000006e> 
0x000000c0000a5698:  0x0000000000000040  0x000071ee79c4d6ba 
0x000000c0000a56a8: <0x000000000040420d <syscall.RawSyscall6+0x000000000000000d>  0x0000000000000000 
0x000000c0000a56b8:  0x000000001c000004  0x000000000046b4a0 <runtime.sigreturn__sigaction+0x0000000000000000> 
0x000000c0000a56c8:  0xffffffffffffffff  0x000000c0000a56f8 
0x000000c0000a56d8:  0x000000000044d72f <runtime.clearSignalHandlers+0x000000000000002f>  0x0000000000579478 
0x000000c0000a56e8:  0x000000c0000a5730  0x0000000000483bf5 <syscall.RawSyscall+0x0000000000000015> 
0x000000c0000a56f8:  0x000000c0000a5730  0x0000000000465d0c <syscall.runtime_AfterForkInChild+0x000000000000004c> 
0x000000c0000a5708:  0x000000c000000002  0x000000c0000a5728 
0x000000c0000a5718:  0x0000000000000000  0x0000000000000008 
0x000000c0000a5728:  0x0000000000000000  0x000000c0000a59e0 
0x000000c0000a5738:  0x000000000047f7c5 <syscall.forkAndExecInChild1+0x0000000000000585>  0x0000000000000038 
0x000000c0000a5748:  0x0000000000004111  0x0000000000000000 
0x000000c0000a5758:  0x000000c0000a57c0  0x00000000000188e0 
0x000000c0000a5768:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5778:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5788:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5798:  0x0000000000000000  0x0000000000000000 
runtime: g 1 gp=0xc0000061c0: unknown pc 0x71ee79c4d340
stack: frame={sp:0xc0000a56a8, fp:0x0} stack=[0xc0000a5000,0xc0000a6000)
0x000000c0000a55a8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55b8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55c8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55d8:  0x0000000000000000  0x0000000000000000 
0x000000c0000a55e8:  0x0000000000000000  0x000000000043498a <runtime.sysSigaction+0x000000000000002a> 
0x000000c0000a55f8:  0x0000000000000040  0x000000c0000a56b0 
0x000000c0000a5608:  0x0000000000000000  0x0000000000000008 
0x000000c0000a5618:  0x0000000000000000  0x000000c0000a5688 
0x000000c0000a5628:  0x0000000000406165 <runtime.sigaction+0x0000000000000145>  0x0000000000000040 
0x000000c0000a5638:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5648:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5658:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5668:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5678:  0x0000000000000000  0x000000c0000a5708 
0x000000c0000a5688:  0x000000c0000a56d0  0x00000000004348ee <runtime.setsig+0x000000000000006e> 
0x000000c0000a5698:  0x0000000000000040  0x000071ee79c4d6ba 
0x000000c0000a56a8: <0x000000000040420d <syscall.RawSyscall6+0x000000000000000d>  0x0000000000000000 
0x000000c0000a56b8:  0x000000001c000004  0x000000000046b4a0 <runtime.sigreturn__sigaction+0x0000000000000000> 
0x000000c0000a56c8:  0xffffffffffffffff  0x000000c0000a56f8 
0x000000c0000a56d8:  0x000000000044d72f <runtime.clearSignalHandlers+0x000000000000002f>  0x0000000000579478 
0x000000c0000a56e8:  0x000000c0000a5730  0x0000000000483bf5 <syscall.RawSyscall+0x0000000000000015> 
0x000000c0000a56f8:  0x000000c0000a5730  0x0000000000465d0c <syscall.runtime_AfterForkInChild+0x000000000000004c> 
0x000000c0000a5708:  0x000000c000000002  0x000000c0000a5728 
0x000000c0000a5718:  0x0000000000000000  0x0000000000000008 
0x000000c0000a5728:  0x0000000000000000  0x000000c0000a59e0 
0x000000c0000a5738:  0x000000000047f7c5 <syscall.forkAndExecInChild1+0x0000000000000585>  0x0000000000000038 
0x000000c0000a5748:  0x0000000000004111  0x0000000000000000 
0x000000c0000a5758:  0x000000c0000a57c0  0x00000000000188e0 
0x000000c0000a5768:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5778:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5788:  0x0000000000000000  0x0000000000000000 
0x000000c0000a5798:  0x0000000000000000  0x0000000000000000 

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:402 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x439d4e
runtime.goparkunlock(...)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:408
runtime.forcegchelper()
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:326 +0xb3 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x439bd3
runtime.goexit({})
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x469881
created by runtime.init.6 in goroutine 1
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:402 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x439d4e
runtime.goparkunlock(...)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:408
runtime.bgsweep(0xc00001c070)
        /home/razz4780/sdk/go1.22.0/src/runtime/mgcsweep.go:278 +0x94 fp=0xc0000917c8 sp=0xc000091780 pc=0x425d94
runtime.gcenable.gowrap1()
        /home/razz4780/sdk/go1.22.0/src/runtime/mgc.go:203 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x41a8e5
runtime.goexit({})
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x469881
created by runtime.gcenable in goroutine 1
        /home/razz4780/sdk/go1.22.0/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc00001c070?, 0x4f4cc8?, 0x1?, 0x0?, 0xc000007340?)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:402 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x439d4e
runtime.goparkunlock(...)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x585f80)
        /home/razz4780/sdk/go1.22.0/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x423789
runtime.bgscavenge(0xc00001c070)
        /home/razz4780/sdk/go1.22.0/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000091fc8 sp=0xc000091fa8 pc=0x423d1c
runtime.gcenable.gowrap2()
        /home/razz4780/sdk/go1.22.0/src/runtime/mgc.go:204 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x41a885
runtime.goexit({})
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x469881
created by runtime.gcenable in goroutine 1
        /home/razz4780/sdk/go1.22.0/src/runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000090648?, 0x40f145?, 0xa8?, 0x1?, 0xc0000061c0?)
        /home/razz4780/sdk/go1.22.0/src/runtime/proc.go:402 +0xce fp=0xc000090620 sp=0xc000090600 pc=0x439d4e
runtime.runfinq()
        /home/razz4780/sdk/go1.22.0/src/runtime/mfinal.go:194 +0x107 fp=0xc0000907e0 sp=0xc000090620 pc=0x419927
runtime.goexit({})
        /home/razz4780/sdk/go1.22.0/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x469881
created by runtime.createfing in goroutine 1
        /home/razz4780/sdk/go1.22.0/src/runtime/mfinal.go:164 +0x3d

Relevant Logs

No response

Your operating system and version


Local process

hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, Go BuildID=fivcHJLRmnzGWcG5-3dM/82M8sLYGI_Jh1IAqMFHE/zrv4NnR3nyiWFytLQ1kn/X02rZMTVxVLHc23KfJsv, with debug_info, not stripped

Local process version

No response

Additional Info

No response

aviramha commented 1 month ago

my guess is we run post-fork and new stack size is too small for us so we overflow (just a guess tho)

aviramha commented 1 month ago

How I would approach this:

meowjesty commented 1 month ago

What I found while digging in this issue:

  1. Commented out our tracing initialization, the problem persisted (same crash as with logs);
  2. The crash happens before we fork:
    1. It actually uses vfork, which I had to hook (copied fork_detour as vfork_detour);
    2. Still did not see our "mirrord is forking" logs after that, so it seems to be crashing in the middle of forking (considering the crash will take you to some syscall.releaseForkLock function);
  3. morestack on g0 doesn't seem to mean anything, it's just an error you'll get when Rust crashes?
  4. I've actually managed to get past the crash, and have the sample execve("echo"):
    1. Went around looking at which syscalls we were missing from the go hooks, looks like the sample was calling dup3, so added our dup3_detour to the go handling;
    2. Seems like dup3 was the culprit for crashing before execve, more specifically, dup3(fd, 2) which piping fd to stderr(?). The other 2 pipes dup3(n, 0|1) were not causing the crash;
    3. Changed to dup3(fd, n + 20), and this got the sample "working", due to all the fds being just Bypasses, I guess that's how it didn't explode further with this change. After making this change, the parent process still crashed, but the child was able to finish and do its thing. Which means the issue is parent only (vfork, execve, and potentially pipe2);
// This is kind of what I was seeing from the trace logs (syscall_abi logs):
int fcntl_getfd_0 = fcntl(0, F_GETFD);
int fcntl_getfd_1 = fcntl(1, F_GETFD);
int fcntl_getfd_2 = fcntl(2, F_GETFD);

int fcntl_getfl_0 = fcntl(0, F_GETFL);
int fcntl_getfl_1 = fcntl(1, F_GETFL);
int fcntl_getfl_2 = fcntl(2, F_GETFL);

int pipefd_0[2];
pipe2(pipefd_0, O_CLOEXEC);

int fcntl_getfl_5 = fcntl(pipefd_0[0], F_GETFL);
int fcntl_getfl_6 = fcntl(pipefd_0[1], F_GETFL);

int pipefd_1[2];
pipe2(pipefd_1, O_CLOEXEC);

int fcntl_getfl_10 = fcntl(pipefd_1[0], F_GETFL);
int fcntl_getfl_11 = fcntl(pipefd_1[1], F_GETFL);

int fcntl_getfl_5_5 = fcntl(pipefd_0[0], F_GETFL);
int fcntl_getfl_10_10 = fcntl(pipefd_1[0], F_GETFL);

dup3(pipefd_0[0], 0, 0);
dup3(pipefd_1[0], 1, 0);
dup3(pipefd_1[1], 2, 0);

pid_t pid = vfork();
tankbusta commented 1 month ago

"morestack on g0" is coming from the Go runtime whenever it cant grow the scheduler stack. https://github.com/golang/go/blob/2707d42966f8985a6663c93e943b9a44b9399fca/src/runtime/asm_amd64.s#L594-L598