google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.8k stars 1.3k forks source link

OCI runtime create failed: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF: unknown #9996

Closed Kos-M closed 8 months ago

Kos-M commented 8 months ago

Description

After a recent upgrade to docker and gvisor on latest versions , i realized i cant run gvisor in docker. The error i got :

$ docker run --rm --runtime=runsc hello-world

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF: unknown.

So i tried different versions of gvisor to see what is working and what doesn't.

Gvisor Docker Outcome
release-20240206.0 Docker version 25.0.3, build 4debf43 Fail
release-20240129.0 Docker version 25.0.3, build 4debf41 Fail
release-20240122.0 Docker version 25.0.3, build 4debf42 Fail
release-20240115.0 Docker version 25.0.3, build 4debf41 Pass
release-20240109.0 Docker version 25.0.3, build 4debf41 Pass
release-20231211.0 Docker version 25.0.3, build 4debf44 Not tested
release-20231218.0 Docker version 25.0.3, build 4debf43 Not tested
release-20221219.0 Docker version 25.0.3, build 4debf41 Pass

If there is a list of compatibility with docker , and recent versions of docker are not supported , let me know and close the issue :) Thanks.

Steps to reproduce

Install Docker version 25.0.3, build 4debf43 , and use Gvisor release-20240206.0 RUN $ docker run --rm --runtime=runsc hello-world

OS used : Debian 12.

runsc version

3 last releases.

release-20240206.0
release-20240129.0
release-20240122.0

docker version (if using docker)

Docker version 25.0.3, build 4debf43

uname

Linux 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

ayushr2 commented 8 months ago

I can not reproduce this issue with the latest build:

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ docker --version
Docker version 25.0.3, build 4debf41

$ sudo /tmp/runsc install --runtime=runsc -- --debug --debug-log=/tmp/logs/
2024/02/11 20:59:53 Runtime runsc not found: adding
2024/02/11 20:59:53 Successfully updated config.

$ sudo systemctl restart docker

$ /tmp/runsc --version
runsc version google-605771274  // This is the build from yesterday.
spec: 1.1.0-rc.2

$ docker run --rm --runtime=runsc hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Can you please reproduce this with runsc flags: --debug --debug-log=/path/to/log/dir/ and upload the debug log files produced.

Kos-M commented 8 months ago

Added also "--strace" to finaly see logs.

just mailed them to you in case of any of those logs containing any sensitive info. Thanks.

milantracy commented 8 months ago

+1 to @ayushr2

I tried 3 failed versions that are listed here, I can't reproduce

# release-20240206.0
jing@jing-ubuntu:~$ docker run --rm --runtime=runsc debian dmesg
[    0.000000] Starting gVisor...
[    0.147413] Recruiting cron-ies...
[    0.340260] Creating cloned children...
[    0.443844] Checking naughty and nice process list...
[    0.745502] Rewriting operating system in Javascript...
[    1.138433] Creating process schedule...
[    1.376041] Generating random numbers by fair dice roll...
[    1.597204] Reading process obituaries...
[    1.654914] Daemonizing children...
[    1.910484] Constructing home...
[    2.067026] Searching for socket adapter...
[    2.176111] Setting up VFS...
[    2.632785] Setting up FUSE...
[    2.962824] Ready!
jing@jing-ubuntu:~$ runsc --version
runsc version release-20240206.0
spec: 1.1.0-rc.1

# release-20240129.0
jing@jing-ubuntu:~$ docker run --rm --runtime=runsc debian dmesg
[    0.000000] Starting gVisor...
[    0.420027] Reticulating splines...
[    0.695509] Consulting tar man page...
[    0.747739] Adversarially training Redcode AI...
[    1.079539] Preparing for the zombie uprising...
[    1.211524] Creating bureaucratic processes...
[    1.334172] Constructing home...
[    1.828618] Recruiting cron-ies...
[    2.070326] Rewriting operating system in Javascript...
[    2.200041] Creating cloned children...
[    2.698119] Searching for socket adapter...
[    3.086574] Setting up VFS...
[    3.405752] Setting up FUSE...
[    3.727192] Ready!
jing@jing-ubuntu:~$ runsc --version
runsc version release-20240129.0
spec: 1.1.0-rc.1

# release-20240122.0
jing@jing-ubuntu:~$ docker run --rm --runtime=runsc debian dmesg
[    0.000000] Starting gVisor...
[    0.477647] Creating cloned children...
[    0.802471] Searching for socket adapter...
[    0.999361] Feeding the init monster...
[    1.492208] Checking naughty and nice process list...
[    1.512583] Singleplexing /dev/ptmx...
[    2.005589] Conjuring /dev/null black hole...
[    2.214344] Waiting for children...
[    2.312105] Verifying that no non-zero bytes made their way into /dev/zero...
[    2.400924] Committing treasure map to memory...
[    2.899932] Synthesizing system calls...
[    3.247789] Setting up VFS...
[    3.607451] Setting up FUSE...
[    3.860964] Ready!
jing@jing-ubuntu:~$ runsc --version
runsc version release-20240122.0
spec: 1.1.0-rc.1

My ubuntu machine settings are

jing@jing-ubuntu:~$ uname -a
Linux jing-ubuntu 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov  2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
jing@jing-ubuntu:~$ docker version
Client: Docker Engine - Community
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:09 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 21:13:09 2024
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
ayushr2 commented 8 months ago

Thanks @Kos-M. Yeah I can see a SIGILL in boot logs. What cpu family are you using?

Seems like this is an illegal instruction on your architecture. I think the culprit is https://github.com/google/gvisor/commit/e9bdc76c02bd6ad4c6af574fa1dd31578bec012e, which was submitted on Jan 17 (which lines up with your analysis above that things started failing after release-20240122.0). cc @konstantin-s-bogom.

I0212 00:54:39.080482  276892 loader.go:677] Platform: systrap
SIGILL: illegal instruction
PC=0x66ba45 m=3 sigcode=2
instruction bytes: 0xf 0x1 0xd0 0x89 0x44 0x24 0x10 0x89 0x54 0x24 0x14 0xc3 0xcc 0xcc 0xcc 0xcc

goroutine 1 [running]:
gvisor.dev/gvisor/pkg/cpuid.xgetbv(0x0)
    pkg/cpuid/native_amd64.s:35 +0x5 fp=0xc000424ce8 sp=0xc000424ce0 pc=0x66ba45
gvisor.dev/gvisor/pkg/cpuid.FeatureSet.AMXExtendedStateSize(...)
    pkg/cpuid/cpuid_amd64.go:404
gvisor.dev/gvisor/pkg/sentry/platform/systrap/sysmsg.(*ArchState).Init(0x1f16d48)
    pkg/sentry/platform/systrap/sysmsg/sysmsg_amd64.go:56 +0x72 fp=0xc000424d48 sp=0xc000424ce8 pc=0xce7592
gvisor.dev/gvisor/pkg/sentry/platform/systrap.New()
    pkg/sentry/platform/systrap/systrap.go:318 +0x1e fp=0xc000424d80 sp=0xc000424d48 pc=0xcf7e1e
gvisor.dev/gvisor/pkg/sentry/platform/systrap.(*constructor).New(0x1262bc7, 0x7?)
    pkg/sentry/platform/systrap/systrap.go:391 +0xf fp=0xc000424d90 sp=0xc000424d80 pc=0xcf810f
gvisor.dev/gvisor/runsc/boot.createPlatform(0xc0000ce840, 0x1ea0f88?)
    runsc/boot/loader.go:678 +0xe5 fp=0xc000424e18 sp=0xc000424d90 pc=0xe47625
gvisor.dev/gvisor/runsc/boot.New({{0x7ffcb4482fa8, 0x40}, 0xc00009c3f0, 0xc0000ce840, 0xc, 0x0, {0xc0000b3640, 0x4, 0x4}, 0xffffffffffffffff, ...})
    runsc/boot/loader.go:406 +0x97f fp=0xc000425630 sp=0xc000424e18 pc=0xe4517f
gvisor.dev/gvisor/runsc/cmd.(*Boot).Execute(0xc000002180, {0xc00003e290?, 0xc000077b00?}, 0xc000159a40, {0xc000077b00, 0x2, 0x28?})
    runsc/cmd/boot.go:447 +0x14f8 fp=0xc000425cc0 sp=0xc000425630 pc=0xf5e918
github.com/google/subcommands.(*Commander).Execute(0xc0000cc000, {0x148c160, 0x1f162e0}, {0xc000077b00, 0x2, 0x2})
    external/com_github_google_subcommands/subcommands.go:200 +0x38c fp=0xc000425d60 sp=0xc000425cc0 pc=0x511e8c
github.com/google/subcommands.Execute(...)
    external/com_github_google_subcommands/subcommands.go:481
gvisor.dev/gvisor/runsc/cli.Main()
    runsc/cli/main.go:221 +0x141c fp=0xc000425f30 sp=0xc000425d60 pc=0xf8da1c
main.main()
    runsc/main.go:31 +0xf fp=0xc000425f40 sp=0xc000425f30 pc=0xf8ea0f
runtime.main()
    GOROOT/src/runtime/proc.go:267 +0x2bb fp=0xc000425fe0 sp=0xc000425f40 pc=0x43d61b
runtime.goexit()
    src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000425fe8 sp=0xc000425fe0 pc=0x4713e1
Kos-M commented 8 months ago

Me neither , can reproduce it in a local vm with fresh installation of os and docker/gVisor . Seems is related to the cpu , as @ayushr2 see in the logs.

My cpu details of affected machinge :

$ lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               AuthenticAMD
  Model name:            QEMU Virtual CPU version 2.1.2
    CPU family:          6
    Model:               6
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           2
    Stepping:            3
    BogoMIPS:            4200.08
    Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm
                         ov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm no
                         pl cpuid tsc_known_freq pni cx16 x2apic popcnt hypervis
                         or lahf_lm svm abm sse4a 3dnowprefetch vmmcall
Virtualization features: 
  Virtualization:        AMD-V
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   128 KiB (2 instances)
  L1i:                   128 KiB (2 instances)
  L2:                    1 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0,1
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PB
                         RSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
konstantin-s-bogom commented 8 months ago

@Kos-M can you try building gVisor with https://github.com/google/gvisor/pull/9998 to see if it fixes your issue? I don't have a CPU with your featureset on hand to test this myself.

Kos-M commented 8 months ago

@konstantin-s-bogom thanks for you effort . Just started the building process.. not usre how much resource hungry is , givisor. Im running it in a limited instance. I ll update you as soon ,as i got results..

Kos-M commented 8 months ago

It works!! 🎉

 $ which runsc 
/usr/bin/runsc
$ runsc --version 
runsc version release-20240206.0-20-gc16fcc1abee2
spec: 1.1.0-rc.1                                              
$ docker -D -l debug run --rm --runtime=runsc hello-world                                                                                                       
Hello from Docker!                                            
This message shows that your installation appears to be working correctly.                                                                                                                
To generate this message, Docker took the following steps:     
1. The Docker client contacted the Docker daemon.             
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.                                                       
(amd64)                                                    3. The Docker daemon created a new container from that image which runs the                                                    
executable that produces the output you are currently reading.                                                           
4. The Docker daemon streamed that output to the Docker client, which sent it                                                  
to your terminal.                                                                                                       
To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker 
ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

DEBU[0002] [hijack] End of stdout

After build i uninstalled current version and copied runsc executable in same location.

/etc/docker/daemon.json points same executable. Didnt modified.

Kos-M commented 8 months ago

Sorry for bothering you again with this issue , but i cant run gVisor from latest release. Is that fix merged ?

I think its merged but, tried release-20240212.0 and seems the issue remains.

$ runsc --version
runsc version release-20240212.0
spec: 1.1.0-rc.1
$ docker -D -l debug run --rm --runtime=runsc hello-world
DEBU[0001] [hijack] End of stdout                       
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF: unknown.
$ cat /etc/docker/daemon.json
{
    "runtimes": {
        "runsc": {
            "path": "/usr/bin/runsc"
        }
    }
}
$ which runsc
/usr/bin/runsc

Run docker service restart after replacing a temp build executable of gvisor from branch test/cl606266725 which is working fine.

Thanks again..

ayushr2 commented 8 months ago

https://github.com/google/gvisor/commit/5300f3d30597c70e92fcb231cff026257c42ecc0 was merged on Feb 13th. release-20240212.0 is from Feb 12th. So it doesn't contain the fix.

Kos-M commented 8 months ago

Oh i see.. Releases are on a specific schedule or on demand ?

ayushr2 commented 8 months ago

On a schedule. We usually cut the Monday candidates. So https://github.com/google/gvisor/commit/5300f3d30597c70e92fcb231cff026257c42ecc0 will be available in next week's release.

Kos-M commented 8 months ago

Great , thanks for your time!