google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.56k stars 1.28k forks source link

Illegal instruction when gvisor running with --platform=kvm #10625

Closed q53 closed 2 months ago

q53 commented 2 months ago

Description

# cat /etc/docker/daemon.json
{
    "runtimes": {
        "runsc": {
            "path": "/usr/local/bin/runsc",
            "runtimeArgs": [
                "--platform=systrap"
            ]
        },
        "runsc-kvm": {
            "path": "/usr/local/bin/runsc",
            "runtimeArgs": [
                "--platform=kvm"
            ]
        },

        "runsc-debug": {
            "path": "/usr/local/bin/runsc",
            "runtimeArgs": [
                "--debug",
                "--debug-log=/tmp/runsc-debug.log",
                "--strace"
            ]
        }
    },
    "storage-driver": "zfs"
}

# docker -D -l debug run -i --runtime runsc  --rm --name=test  docker.io/library/registry:latest sh
registry

Usage: 
  registry [flags]
  registry [command]

Available Commands: 
  serve           `serve` stores and distributes Docker images
  garbage-collect `garbage-collect` deletes layers not referenced by any manifests
  help            Help about any command

Flags:
  -h, --help=false: help for registry
  -v, --version=false: show the version and exit

Use "registry help [command]" for more information about a command.
^Ctime="2024-07-05T22:59:32Z" level=debug msg="Error hijack: context canceled"
time="2024-07-05T22:59:32Z" level=debug msg="[hijack] End of stdout"
time="2024-07-05T22:59:32Z" level=debug msg="Error receiveStdout: read unix @->/run/docker.sock: use of closed network connection"
context canceled

# docker -D -l debug run -i --runtime runsc-kvm  --rm --name=test  docker.io/library/registry:latest sh
registry
Illegal instruction
^Ctime="2024-07-05T23:01:43Z" level=debug msg="Error hijack: context canceled"
time="2024-07-05T23:01:43Z" level=debug msg="[hijack] End of stdout"
time="2024-07-05T23:01:43Z" level=debug msg="Error receiveStdout: read unix @->/run/docker.sock: use of closed network connection"
context canceled

# runsc --version
runsc version release-20240305.0
spec: 1.1.0-rc.1

# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel            
CPU family:          6
Model:               44
Model name:          Intel(R) Xeon(R) CPU           E5606  @ 2.13GHz
BIOS Model name:     Intel(R) Xeon(R) CPU           E5606  @ 2.13GHz     
Stepping:            2
CPU MHz:             2133.000
CPU max MHz:         2133.0000
CPU min MHz:         1200.0000
BogoMIPS:            4266.70
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm arat flush_l1d

Reproduced on the one specific host with Intel CPU, other AMD host has no issues. Version release-20240624.0 is affected as well, and launching with --platform=systrap end up with "panic: seccomp failed: invalid argument" in debug logs (on all hosts), but looks like it is another bug.

Steps to reproduce

docker -D -l debug run -i --runtime runsc-kvm --rm --name=test docker.io/library/registry:latest time="2024-07-05T22:57:36Z" level=debug msg="[hijack] End of stdout"

runsc version

version release-20240305.0
version release-20240624.0

docker version (if using docker)

Client: Docker Engine - Community
 Version:    27.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.15.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.28.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 27.0.3
 Storage Driver: zfs
  Zpool: pool0
  Zpool Health: ONLINE
  Parent Dataset: pool0/containers/docker
  Space Used By Parent: 18509824
  Space Available: 4927228530688
  Parent Quota: no
  Compression: on
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runsc-debug runsc-kvm runsc-ptrace io.containerd.runc.v2 runc runsc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-425.3.1.el8.x86_64
 Operating System: AlmaLinux 8.9 (Midnight Oncilla)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 23.32GiB
 Name: localhost
 ID: 8c7c4ac3-74e8-4e60-9811-ccd6271eaf5f
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

uname

4.18.0-425.3.1.el8.x86_64 #1 SMP Tue Nov 8 14:08:25 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

I0705 23:27:43.226324       1 kernel.go:942] EXEC: [/entrypoint.sh /etc/docker/registry/config.yml]
I0705 23:27:43.227483       1 pgalloc.go:719] Disabling pgalloc.MemoryFile.AllocateAndFill pre-population: madvise failed: invalid argument
D0705 23:27:43.229788       1 syscalls.go:262] Allocating stack with size of 8388608 bytes
W0705 23:27:43.229944       1 loader.go:1062] Seccomp spec is being ignored
I0705 23:27:43.230498       1 loader.go:832] Process should have started...
I0705 23:27:43.230532       1 watchdog.go:205] Starting watchdog, period: 45s, timeout: 3m0s, action: logWarning
D0705 23:27:43.230646       1 urpc.go:568] urpc: successfully marshalled 37 bytes.
D0705 23:27:43.231199  532367 urpc.go:611] urpc: unmarshal success.
D0705 23:27:43.231278  532367 container.go:1078] Save container, cid: 8dc51beabc1fe442c1880e18ecbab58052a14d00c1c700bc4680a0b6f0c547b7
I0705 23:27:43.231789       1 strace.go:564] [   1:   1] entrypoint.sh E arch_prctl(0x1002, 0x7f1cc0805b48)
I0705 23:27:43.232011       1 strace.go:602] [   1:   1] entrypoint.sh X arch_prctl(0x1002, 0x7f1cc0805b48) = 0 (0x0) (5.353µs)
I0705 23:27:43.232069       1 strace.go:561] [   1:   1] entrypoint.sh E set_tid_address(0x7f1cc0805fb8)
I0705 23:27:43.232110       1 strace.go:599] [   1:   1] entrypoint.sh X set_tid_address(0x7f1cc0805fb8) = 1 (0x1) (10.151µs)
D0705 23:27:43.232129  532367 state_file.go:78] Load container, rootDir: "/var/run/docker/runtime-runc/moby", id: {SandboxID:8dc51beabc1fe442c1880e18ecbab58052a14d00c1c700bc4680a0b6f0c547b7 ContainerID:8dc51beabc1fe442c1880e18ecbab58052a14d00c1c700bc4680a0b6f0c547b7}, opts: {Exact:true SkipCheck:true TryLock:false RootContainer:false}
I0705 23:27:43.232548       1 strace.go:561] [   1:   1] entrypoint.sh E brk(0x0)
I0705 23:27:43.232615       1 strace.go:599] [   1:   1] entrypoint.sh X brk(0x0) = 94023995543552 (0x5583aadad000) (11.184µs)
I0705 23:27:43.232653       1 strace.go:561] [   1:   1] entrypoint.sh E brk(0x5583aadaf000)
I0705 23:27:43.232734       1 strace.go:599] [   1:   1] entrypoint.sh X brk(0x5583aadaf000) = 94023995551744 (0x5583aadaf000) (6.617µs)
I0705 23:27:43.232889       1 strace.go:576] [   1:   1] entrypoint.sh E mmap(0x5583aadad000, 0x1000, 0x0, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.232938       1 strace.go:614] [   1:   1] entrypoint.sh X mmap(0x5583aadad000, 0x1000, 0x0, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 94023995543552 (0x5583aadad000) (9.681µs)
I0705 23:27:43.233124       1 strace.go:567] [   1:   1] entrypoint.sh E mprotect(0x7f1cc0802000, 0x1000, 0x1)
I0705 23:27:43.233297       1 strace.go:605] [   1:   1] entrypoint.sh X mprotect(0x7f1cc0802000, 0x1000, 0x1) = 0 (0x0) (79.59µs)
I0705 23:27:43.233460       1 strace.go:567] [   1:   1] entrypoint.sh E mprotect(0x5583aada8000, 0x4000, 0x1)
I0705 23:27:43.233540       1 strace.go:605] [   1:   1] entrypoint.sh X mprotect(0x5583aada8000, 0x4000, 0x1) = 0 (0x0) (37.709µs)
I0705 23:27:43.233744       1 strace.go:559] [   1:   1] entrypoint.sh E getuid()
I0705 23:27:43.233785       1 strace.go:596] [   1:   1] entrypoint.sh X getuid() = 0 (0x0) (1.876µs)
I0705 23:27:43.234068       1 strace.go:576] [   1:   1] entrypoint.sh E mmap(0x0, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.234164       1 strace.go:614] [   1:   1] entrypoint.sh X mmap(0x0, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139761464803328 (0x7f1cc0768000) (34.915µs)
I0705 23:27:43.234263       1 strace.go:559] [   1:   1] entrypoint.sh E getpid()
I0705 23:27:43.234298       1 strace.go:596] [   1:   1] entrypoint.sh X getpid() = 1 (0x1) (1.622µs)
I0705 23:27:43.234363       1 strace.go:576] [   1:   1] entrypoint.sh E mmap(0x0, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.234438       1 strace.go:614] [   1:   1] entrypoint.sh X mmap(0x0, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139761464795136 (0x7f1cc0766000) (26.81µs)
I0705 23:27:43.234538  532367 main.go:226] Exiting with status: 0
I0705 23:27:43.234773       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigprocmask(SIG_UNBLOCK, 0x7f0cb18718c0 [33 34], 0x0, 0x8)
I0705 23:27:43.234830       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigprocmask(SIG_UNBLOCK, 0x7f0cb18718c0 [33 34], null, 0x8) = 0 (0x0) (13.188µs)
I0705 23:27:43.234961       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGCHLD, 0x7f0cb18718a0 {Handler: 0x5583aad267e3, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, 0x0, 0x8)
I0705 23:27:43.235003       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGCHLD, 0x7f0cb18718a0 {Handler: 0x5583aad267e3, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, null, 0x8) = 0 (0x0) (4.588µs)
I0705 23:27:43.235111       1 strace.go:559] [   1:   1] entrypoint.sh E getppid()
I0705 23:27:43.235148       1 strace.go:596] [   1:   1] entrypoint.sh X getppid() = 0 (0x0) (2.151µs)
I0705 23:27:43.235402       1 strace.go:576] [   1:   1] entrypoint.sh E mmap(0x0, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.235490       1 strace.go:614] [   1:   1] entrypoint.sh X mmap(0x0, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139761464791040 (0x7f1cc0765000) (29.75µs)
I0705 23:27:43.235586       1 strace.go:564] [   1:   1] entrypoint.sh E getcwd(0x7f0cb1870bb0, 0x1000)
I0705 23:27:43.235644       1 strace.go:602] [   1:   1] entrypoint.sh X getcwd(0x7f0cb1870bb0 /, 0x1000) = 2 (0x2) (17.122µs)
I0705 23:27:43.235821       1 strace.go:567] [   1:   1] entrypoint.sh E open(0x7f0cb1871f4e /entrypoint.sh, O_RDONLY|O_CLOEXEC|0x8000, 0o0)
I0705 23:27:43.235901       1 strace.go:605] [   1:   1] entrypoint.sh X open(0x7f0cb1871f4e /entrypoint.sh, O_RDONLY|O_CLOEXEC|0x8000, 0o0) = 3 (0x3) (46.6µs)
I0705 23:27:43.235991       1 strace.go:567] [   1:   1] entrypoint.sh E fcntl(0x3 /entrypoint.sh, 0x2, 0x1)
I0705 23:27:43.236026       1 strace.go:605] [   1:   1] entrypoint.sh X fcntl(0x3 /entrypoint.sh, 0x2, 0x1) = 0 (0x0) (3.336µs)
I0705 23:27:43.236066       1 strace.go:567] [   1:   1] entrypoint.sh E fcntl(0x3 /entrypoint.sh, 0x406, 0xa)
I0705 23:27:43.236109       1 strace.go:605] [   1:   1] entrypoint.sh X fcntl(0x3 /entrypoint.sh, 0x406, 0xa) = 10 (0xa) (5.083µs)
I0705 23:27:43.236146       1 strace.go:567] [   1:   1] entrypoint.sh E fcntl(0xa /entrypoint.sh, 0x2, 0x1)
I0705 23:27:43.236171       1 strace.go:605] [   1:   1] entrypoint.sh X fcntl(0xa /entrypoint.sh, 0x2, 0x1) = 0 (0x0) (2.067µs)
I0705 23:27:43.236252       1 strace.go:561] [   1:   1] entrypoint.sh E close(0x3 /entrypoint.sh)
I0705 23:27:43.236288       1 strace.go:599] [   1:   1] entrypoint.sh X close(0x3 /entrypoint.sh) = 0 (0x0) (5.522µs)
I0705 23:27:43.236418       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGINT, null, 0x7f0cb1871af0, 0x8)
I0705 23:27:43.236483       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGINT, null, 0x7f0cb1871af0 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}, 0x8) = 0 (0x0) (4.925µs)
I0705 23:27:43.236555       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGINT, 0x7f0cb1871ad0 {Handler: 0x5583aad267e3, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, 0x0, 0x8)
I0705 23:27:43.236587       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGINT, 0x7f0cb1871ad0 {Handler: 0x5583aad267e3, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, null, 0x8) = 0 (0x0) (3.105µs)
I0705 23:27:43.236623       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGQUIT, null, 0x7f0cb1871af0, 0x8)
I0705 23:27:43.236655       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGQUIT, null, 0x7f0cb1871af0 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}, 0x8) = 0 (0x0) (2.559µs)
I0705 23:27:43.236724       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGQUIT, 0x7f0cb1871ad0 {Handler: SIG_IGN, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, 0x0, 0x8)
I0705 23:27:43.236757       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGQUIT, 0x7f0cb1871ad0 {Handler: SIG_IGN, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, null, 0x8) = 0 (0x0) (3.707µs)
I0705 23:27:43.236798       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGTERM, null, 0x7f0cb1871af0, 0x8)
I0705 23:27:43.236830       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGTERM, null, 0x7f0cb1871af0 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}, 0x8) = 0 (0x0) (2.765µs)
I0705 23:27:43.236970       1 strace.go:567] [   1:   1] entrypoint.sh E read(0xa /entrypoint.sh, 0x7f1cc0766030, 0x7ff)
I0705 23:27:43.237062       1 strace.go:605] [   1:   1] entrypoint.sh X read(0xa /entrypoint.sh, 0x7f1cc0766030 "#!/bin/sh\n\nset -e\n\ncase \"$1\" in\n    *.yaml|*.yml) set -- registry serve \"$@\" ;;\n    serve|garbage-collect|help|-*) set -- registry \"$@\" ;;\nesac\n\nexec \"$@\"\n", 0x7ff) = 155 (0x9b) (46.3µs)
I0705 23:27:43.237256       1 strace.go:570] [   1:   1] entrypoint.sh E rt_sigaction(SIGQUIT, 0x7f0cb1871660 {Handler: SIG_DFL, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, 0x0, 0x8)
I0705 23:27:43.237297       1 strace.go:608] [   1:   1] entrypoint.sh X rt_sigaction(SIGQUIT, 0x7f0cb1871660 {Handler: SIG_DFL, Flags: SA_RESTORER, Restorer: 0x7f1cc07b55a7, Mask: [SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGABRT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGSTKFLT SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR SIGSYS 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64]}, null, 0x8) = 0 (0x0) (4.57µs)
I0705 23:27:43.237415       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /usr/local/sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
I0705 23:27:43.237890       1 loader.go:181] [   1:   1] Error opening /usr/local/sbin/registry: no such file or directory
I0705 23:27:43.237937       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /usr/local/sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) errno=2 (no such file or directory) (462.501µs)
I0705 23:27:43.238023       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /usr/local/bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
I0705 23:27:43.238286       1 loader.go:181] [   1:   1] Error opening /usr/local/bin/registry: no such file or directory
I0705 23:27:43.238319       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /usr/local/bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) errno=2 (no such file or directory) (264.365µs)
I0705 23:27:43.238396       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /usr/sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
I0705 23:27:43.238686       1 loader.go:181] [   1:   1] Error opening /usr/sbin/registry: no such file or directory
I0705 23:27:43.238717       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /usr/sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) errno=2 (no such file or directory) (263.609µs)
I0705 23:27:43.238797       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /usr/bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
I0705 23:27:43.239086       1 loader.go:181] [   1:   1] Error opening /usr/bin/registry: no such file or directory
I0705 23:27:43.239118       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /usr/bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) errno=2 (no such file or directory) (291.272µs)
I0705 23:27:43.239190       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
I0705 23:27:43.239408       1 loader.go:181] [   1:   1] Error opening /sbin/registry: no such file or directory
I0705 23:27:43.239459       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /sbin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) errno=2 (no such file or directory) (239.531µs)
I0705 23:27:43.239550       1 strace.go:567] [   1:   1] entrypoint.sh E execve(0x7f1cc0806230 /bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"])
D0705 23:27:43.243680       1 syscalls.go:262] [   1:   1] Allocating stack with size of 8388608 bytes
I0705 23:27:43.243771       1 strace.go:605] [   1:   1] entrypoint.sh X execve(0x7f1cc0806230 /bin/registry, 0x7f1cc08061e0 ["registry", "serve", "/etc/docker/registry/config.yml"], 0x7f1cc0806200 ["HOSTNAME=8dc51beabc1f", "SHLVL=1", "HOME=/root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "PWD=/"]) = 0 (0x0) (4.172763ms)
I0705 23:27:43.244728       1 strace.go:564] [   1:   1] registry E arch_prctl(0x1002, 0x14bf010)
I0705 23:27:43.244777       1 strace.go:602] [   1:   1] registry X arch_prctl(0x1002, 0x14bf010) = 0 (0x0) (2.524µs)
I0705 23:27:43.245972       1 strace.go:567] [   1:   1] registry E sched_getaffinity(0x0, 0x2000, 0x7f74517a9d90)
I0705 23:27:43.246023       1 strace.go:605] [   1:   1] registry X sched_getaffinity(0x0, 0x2000, 0x7f74517a9d90) = 8 (0x8) (4.775µs)
I0705 23:27:43.246116       1 strace.go:570] [   1:   1] registry E openat(AT_FDCWD /, 0x145d8e0 /sys/kernel/mm/transparent_hugepage/hpage_pmd_size, O_RDONLY|0x0, 0o0)
I0705 23:27:43.246188       1 strace.go:608] [   1:   1] registry X openat(AT_FDCWD /, 0x145d8e0 /sys/kernel/mm/transparent_hugepage/hpage_pmd_size, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (22.646µs)
I0705 23:27:43.247383       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x40000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247488       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x40000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853111894016 (0x7f32170e2000) (38.221µs)
I0705 23:27:43.247591       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x20000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247637       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x20000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853111762944 (0x7f32170c2000) (6.229µs)
I0705 23:27:43.247680       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x100000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247711       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x100000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853110714368 (0x7f3216fc2000) (4.625µs)
I0705 23:27:43.247748       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x800000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247778       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x800000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853100482560 (0x7f3216600000) (4.358µs)
I0705 23:27:43.247815       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x4000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247845       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x4000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853033373696 (0x7f3212600000) (4.11µs)
I0705 23:27:43.247880       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x20000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247910       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x20000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852496502784 (0x7f31f2600000) (4.027µs)
I0705 23:27:43.247945       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x800000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.247978       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x800000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852488114176 (0x7f31f1e00000) (7.566µs)
I0705 23:27:43.248379       1 strace.go:576] [   1:   1] registry E mmap(0xc000000000, 0x4000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.248420       1 strace.go:614] [   1:   1] registry X mmap(0xc000000000, 0x4000000, 0x0, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 824633720832 (0xc000000000) (5.885µs)
I0705 23:27:43.248515       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x2000000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.248549       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x2000000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852454559744 (0x7f31efe00000) (5.528µs)
I0705 23:27:43.248694       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x114c10, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.248778       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x114c10, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853109579776 (0x7f3216ead000) (48.695µs)
I0705 23:27:43.248867       1 strace.go:576] [   1:   1] registry E mmap(0xc000000000, 0x400000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.248907       1 strace.go:614] [   1:   1] registry X mmap(0xc000000000, 0x400000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 824633720832 (0xc000000000) (8.156µs)
I0705 23:27:43.249130       1 strace.go:576] [   1:   1] registry E mmap(0x7f32170c2000, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249202       1 strace.go:614] [   1:   1] registry X mmap(0x7f32170c2000, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853111762944 (0x7f32170c2000) (29.54µs)
I0705 23:27:43.249309       1 strace.go:576] [   1:   1] registry E mmap(0x7f3217042000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249381       1 strace.go:614] [   1:   1] registry X mmap(0x7f3217042000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853111238656 (0x7f3217042000) (34.677µs)
I0705 23:27:43.249446       1 strace.go:576] [   1:   1] registry E mmap(0x7f3216a06000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249511       1 strace.go:614] [   1:   1] registry X mmap(0x7f3216a06000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853104701440 (0x7f3216a06000) (35.205µs)
I0705 23:27:43.249580       1 strace.go:576] [   1:   1] registry E mmap(0x7f3214630000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249654       1 strace.go:614] [   1:   1] registry X mmap(0x7f3214630000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853067124736 (0x7f3214630000) (44.62µs)
I0705 23:27:43.249706       1 strace.go:576] [   1:   1] registry E mmap(0x7f3202780000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249767       1 strace.go:614] [   1:   1] registry X mmap(0x7f3202780000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852766511104 (0x7f3202780000) (33.025µs)
I0705 23:27:43.249817       1 strace.go:576] [   1:   1] registry E mmap(0x7f31f1e00000, 0x407000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249852       1 strace.go:614] [   1:   1] registry X mmap(0x7f31f1e00000, 0x407000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852488114176 (0x7f31f1e00000) (8.715µs)
I0705 23:27:43.249911       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x100000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.249987       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x100000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139852453511168 (0x7f31efd00000) (44.069µs)
I0705 23:27:43.250127       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x10000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.250188       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x10000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853109514240 (0x7f3216e9d000) (26.887µs)
I0705 23:27:43.250279       1 strace.go:576] [   1:   1] registry E mmap(0x0, 0x10000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0)
I0705 23:27:43.250336       1 strace.go:614] [   1:   1] registry X mmap(0x0, 0x10000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0xffffffffffffffff (bad FD), 0x0) = 139853109448704 (0x7f3216e8d000) (25.5µs)
D0705 23:27:43.250829       1 task_signals.go:470] [   1:   1] Notified of signal 4
D0705 23:27:43.250930       1 task_signals.go:204] [   1:   1] Signal 4, PID: 1, TID: 1, fault addr: 0x401e65: terminating thread group
D0705 23:27:43.250954       1 task_exit.go:204] [   1:   1] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0705 23:27:43.251702       1 connection.go:127] sock read failed, closing connection: EOF
D0705 23:27:43.251805       1 connection.go:127] sock read failed, closing connection: EOF
I0705 23:27:43.251930       1 loader.go:1112] Gofer socket disconnected, killing container "8dc51beabc1fe442c1880e18ecbab58052a14d00c1c700bc4680a0b6f0c547b7"
D0705 23:27:43.251964       1 connection.go:127] sock read failed, closing connection: EOF
D0705 23:27:43.252014       1 connection.go:127] sock read failed, closing connection: EOF
D0705 23:27:43.252074       1 connection.go:127] sock read failed, closing connection: EOF
I0705 23:27:43.252137       1 gofer.go:340] All lisafs servers exited.
D0705 23:27:43.252137       1 task_exit.go:361] [   1:   1] Init process terminating, killing namespace
D0705 23:27:43.252175       1 task_signals.go:481] [   1:   1] No task notified of signal 9
I0705 23:27:43.252168       1 main.go:226] Exiting with status: 0
D0705 23:27:43.252198       1 task_exit.go:204] [   1:   1] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D0705 23:27:43.252213       1 task_exit.go:204] [   1:   1] Transitioning from exit state TaskExitZombie to TaskExitDead
I0705 23:27:43.252242       1 boot.go:508] application exiting with killed by signal 4
I0705 23:27:43.252322       1 watchdog.go:221] Stopping watchdog
I0705 23:27:43.252343       1 watchdog.go:225] Watchdog stopped
I0705 23:27:43.252520       1 main.go:226] Exiting with status: 4
EtiennePerot commented 2 months ago

launching with --platform=systrap end up with "panic: seccomp failed: invalid argument" in debug logs (on all hosts), but looks like it is another bug

Yes. Can you file another bug with more details about this Systrap seccomp issue? Turning on debug logging should show the seccomp-bpf program being sent to the kernel prior to getting EINVAL. I believe it may be due to your 4.18.0 kernel (that's quite old) not supporting SECCOMP_IOCTL_NOTIF_* but I thought some fallback code for this was added for older kernels.

q53 commented 2 months ago

Can you file another bug with more details about this Systrap seccomp issue?

#10633

avagin commented 2 months ago

I think it is about the FSGSBASE instructions. The kernel is too old and doesn't support them: https://www.kernel.org/doc/html/v5.9/x86/x86_64/fsgs.html#accessing-fs-gs-base-with-the-fsgsbase-instructions

avagin commented 2 months ago

@q53 Could you try to reproduce the issue with this patch:

diff --git a/pkg/ring0/lib_amd64.go b/pkg/ring0/lib_amd64.go
index fe69b6988..d42a587f4 100644
--- a/pkg/ring0/lib_amd64.go
+++ b/pkg/ring0/lib_amd64.go
@@ -117,6 +117,7 @@ func Init(fs cpuid.FeatureSet) {
        hasXSAVEOPT = fs.UseXsaveopt()
        hasXSAVE = fs.UseXsave()
        hasFSGSBASE = fs.HasFeature(cpuid.X86FeatureFSGSBase)
+       hasFSGSBASE = false
        validXCR0Mask = uintptr(fs.ValidXCR0Mask())
        if hasXSAVE {
                XCR0DisabledMask := uintptr((1 << 9) | (1 << 17) | (1 << 18))
q53 commented 2 months ago

@q53 Could you try to reproduce the issue with this patch:

It does not work.

q53 commented 2 months ago

I think it is about the FSGSBASE instructions. The kernel is too old and doesn't support them: https://www.kernel.org/doc/html/v5.9/x86/x86_64/fsgs.html#accessing-fs-gs-base-with-the-fsgsbase-instructions

On the other host with same kernel version but AMD processor it works fine, so I do not believe it is a kernel version issue.

avagin commented 2 months ago

Oops. I haven't read the description to the end and decided that runsc failed with "Illegal instruction". Actually, it is the app inside gvsior failed with this error. We need to find out what instruction triggers the signal. Could you reproduce the issue with the next patch and attach the runsc debug log:

diff --git a/pkg/sentry/kernel/task_signals.go b/pkg/sentry/kernel/task_signals.go
index 22d6bcddf..d8112b70e 100644
--- a/pkg/sentry/kernel/task_signals.go
+++ b/pkg/sentry/kernel/task_signals.go
@@ -202,6 +202,7 @@ func (t *Task) deliverSignal(info *linux.SignalInfo, act linux.SigAction) taskRu
                }

                t.Debugf("Signal %d, PID: %d, TID: %d, fault addr: %#x: terminating thread group", info.Signo, ucs.Pid, ucs.Tid, ucs.FaultAddr)
+               t.DebugDumpState()
                eventchannel.Emit(ucs)

                t.PrepareGroupExit(linux.WaitStatusTerminationSignal(sig))
q53 commented 2 months ago

runsc-kvm-debug.log

avagin commented 2 months ago

@q53 The xgetbv instruction triggers a fault. According the output of lscpu, your cpu doesn't support it. The question is why the app is trying to use it. Could you show output of cat /proc/cpuinfo from the gvisor container?

avagin commented 2 months ago

I think I figured out the root cause of this issue. Golang uses xgetbv, if cpuid reports OSXSAVE: https://github.com/golang/go/blob/959b3fd4265d7e4efb18af454cd18799ed70b8fe/src/internal/cpu/cpu_x86.go#L122

The kvm platform always set OSXSAVE: https://github.com/google/gvisor/blob/e87ab0a3018d1e5a622ed5b0e13e413dd30a86d2/pkg/sentry/platform/kvm/kvm_amd64.go#L237

q53 commented 2 months ago

I think I figured out the root cause of this issue. Golang uses xgetbv, if cpuid reports OSXSAVE: https://github.com/golang/go/blob/959b3fd4265d7e4efb18af454cd18799ed70b8fe/src/internal/cpu/cpu_x86.go#L122

The kvm platform always set OSXSAVE:

https://github.com/google/gvisor/blob/e87ab0a3018d1e5a622ed5b0e13e413dd30a86d2/pkg/sentry/platform/kvm/kvm_amd64.go#L237

Building with the commented line does not trigger the error.