tensorchord / envd

🏕️ Reproducible development environment
https://envd.tensorchord.ai/
Apache License 2.0
2.04k stars 160 forks source link

bug: bootstrap failed with timeout 5s: cannot connect to buildkitd in version 0.3.36 #1720

Open hakurena opened 1 year ago

hakurena commented 1 year ago

Are you use the envd server?

Describe the bug

bootstrap failed with the same problem of the closed issue #709 in the latest version (0.3.36). I've tried

docker rm envd_buildkitd

and retry the bootstrap, but get the report that the container doesn't exist.

To Reproduce

rika@gult:~$ envd --debug bootstrap --dockerhub-mirror https://docker.mirrors.sjtug.sjtu.edu.cn > ~/envd.bootstrap.fail.log DEBU[2023-08-03T16:43:51+08:00] /home/rika/.config/envd/id_rsa_envd.pub already present DEBU[2023-08-03T16:43:51+08:00] /home/rika/.config/envd/id_rsa_envd already present DEBU[2023-08-03T16:43:51+08:00] home manager initialized cache-dir=/home/rika/.cache/envd cache-map="map[oh-my-zsh:true]" cache-status=/home/rika/.cache/envd/cache.status config-file=/home/rika/.config/envd/config.envd context="{default [{default docker-container envd_buildkitd docker }]}" context-file=/home/rika/.config/envd/contexts DEBU[2023-08-03T16:43:51+08:00] telemetry initialization UID=b3cd4db1-1ff1-4cfd-829b-fdcc99f6b5b8 DEBU[2023-08-03T16:43:51+08:00] sending telemetry
INFO[2023-08-03T16:43:51+08:00] [1/5] Bootstrap SSH Key
DEBU[2023-08-03T16:43:51+08:00] /home/rika/.config/envd/id_rsa_envd.pub already present DEBU[2023-08-03T16:43:51+08:00] /home/rika/.config/envd/id_rsa_envd already present INFO[2023-08-03T16:43:51+08:00] [2/5] Bootstrap registry CA keypair
INFO[2023-08-03T16:43:51+08:00] [3/5] Bootstrap registry json config
INFO[2023-08-03T16:43:51+08:00] [4/5] Bootstrap autocomplete
INFO[2023-08-03T16:43:51+08:00] Install bash autocompletion
WARN[2023-08-03T16:43:51+08:00] Warning: failed writing to /usr/share/bash-completion/completions/envd: open /usr/share/bash-completion/completions/envd: permission denied INFO[2023-08-03T16:43:51+08:00] You may have to restart your shell for autocomplete to get initialized (e.g. run "exec $SHELL") INFO[2023-08-03T16:43:51+08:00] [5/5] Bootstrap buildkit
DEBU[2023-08-03T16:43:51+08:00] bootstrap the buildkitd container
DEBU[2023-08-03T16:43:51+08:00] commandconn: starting docker with [exec -i envd_buildkitd buildctl dial-stdio] DEBU[2023-08-03T16:43:51+08:00] starting buildkitd buildkit-config="&{[{docker.io false https://docker.mirrors.sjtug.sjtu.edu.cn}]}" container=envd_buildkitd tag="docker.io/moby/buildkit:v0.10.6" DEBU[2023-08-03T16:43:51+08:00] commandconn (docker):Error response from daemon: No such container: envd_buildkitd DEBU[2023-08-03T16:43:52+08:00] container is running, check if it's ready at docker-container://envd_buildkitd... container=envd_buildkitd driver=docker-container image="docker.io/moby/buildkit:v0.10.6" socket=envd_buildkitd DEBU[2023-08-03T16:43:52+08:00] waiting to connect to buildkitd container=envd_buildkitd driver=docker-container image="docker.io/moby/buildkit:v0.10.6" socket=envd_buildkitd DEBU[2023-08-03T16:43:53+08:00] commandconn: starting docker with [exec -i envd_buildkitd buildctl dial-stdio] DEBU[2023-08-03T16:43:53+08:00] commandconn (docker):Error response from daemon: No such container: envd_buildkitd DEBU[2023-08-03T16:43:53+08:00] failed to connect to buildkitd: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i envd_buildkitd buildctl dial-stdio] has exited with exit status 1, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error response from daemon: No such container: envd_buildkitd\n" DEBU[2023-08-03T16:43:54+08:00] failed to connect to buildkitd: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i envd_buildkitd buildctl dial-stdio] has exited with exit status 1, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error response from daemon: No such container: envd_buildkitd\n" DEBU[2023-08-03T16:43:55+08:00] commandconn: starting docker with [exec -i envd_buildkitd buildctl dial-stdio] DEBU[2023-08-03T16:43:55+08:00] commandconn (docker):Error response from daemon: No such container: envd_buildkitd DEBU[2023-08-03T16:43:55+08:00] failed to connect to buildkitd: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i envd_buildkitd buildctl dial-stdio] has exited with exit status 1, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error response from daemon: No such container: envd_buildkitd\n" DEBU[2023-08-03T16:43:56+08:00] failed to connect to buildkitd: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i envd_buildkitd buildctl dial-stdio] has exited with exit status 1, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error response from daemon: No such container: envd_buildkitd\n" error: failed to create buildkit client: failed to bootstrap the buildkitd: failed to connect to buildkitd docker-container://envd_buildkitd: timeout 5s: cannot connect to buildkitd (1) attached stack trace -- stack trace: | github.com/tensorchord/envd/pkg/app.buildkit | /home/runner/work/envd/envd/pkg/app/bootstrap.go:417 | github.com/tensorchord/envd/pkg/app.bootstrap | /home/runner/work/envd/envd/pkg/app/bootstrap.go:114 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:267 | github.com/urfave/cli/v2.(App).RunContext | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 | github.com/urfave/cli/v2.(App).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309 | main.run | /home/runner/work/envd/envd/cmd/envd/main.go:39 | main.main | /home/runner/work/envd/envd/cmd/envd/main.go:67 | runtime.main | /opt/hostedtoolcache/go/1.19.10/x64/src/runtime/proc.go:250 Wraps: (2) failed to create buildkit client Wraps: (3) attached stack trace -- stack trace: | github.com/tensorchord/envd/pkg/buildkitd.NewClient | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:135 | github.com/tensorchord/envd/pkg/app.buildkit | /home/runner/work/envd/envd/pkg/app/bootstrap.go:414 | github.com/tensorchord/envd/pkg/app.bootstrap | /home/runner/work/envd/envd/pkg/app/bootstrap.go:114 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:267 | github.com/urfave/cli/v2.(App).RunContext | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 | github.com/urfave/cli/v2.(App).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309 | main.run | /home/runner/work/envd/envd/cmd/envd/main.go:39 | main.main | /home/runner/work/envd/envd/cmd/envd/main.go:67 | runtime.main | /opt/hostedtoolcache/go/1.19.10/x64/src/runtime/proc.go:250 Wraps: (4) failed to bootstrap the buildkitd Wraps: (5) attached stack trace -- stack trace: | github.com/tensorchord/envd/pkg/buildkitd.(generalClient).maybeStart | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:182 | [...repeated from below...] Wraps: (6) failed to connect to buildkitd docker-container://envd_buildkitd Wraps: (7) attached stack trace -- stack trace: | github.com/tensorchord/envd/pkg/buildkitd.generalClient.waitUntilConnected | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:213 | github.com/tensorchord/envd/pkg/buildkitd.(generalClient).maybeStart | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:181 | github.com/tensorchord/envd/pkg/buildkitd.(generalClient).Bootstrap | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:142 | github.com/tensorchord/envd/pkg/buildkitd.NewClient | /home/runner/work/envd/envd/pkg/buildkitd/buildkitd.go:134 | github.com/tensorchord/envd/pkg/app.buildkit | /home/runner/work/envd/envd/pkg/app/bootstrap.go:414 | github.com/tensorchord/envd/pkg/app.bootstrap | /home/runner/work/envd/envd/pkg/app/bootstrap.go:114 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 | github.com/urfave/cli/v2.(Command).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:267 | github.com/urfave/cli/v2.(App).RunContext | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 | github.com/urfave/cli/v2.(App).Run | /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309 | main.run | /home/runner/work/envd/envd/cmd/envd/main.go:39 | main.main | /home/runner/work/envd/envd/cmd/envd/main.go:67 | runtime.main | /opt/hostedtoolcache/go/1.19.10/x64/src/runtime/proc.go:250 | runtime.goexit | /opt/hostedtoolcache/go/1.19.10/x64/src/runtime/asm_amd64.s:1594 Wraps: (8) timeout 5s: cannot connect to buildkitd Error types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) withstack.withStack (6) errutil.withPrefix (7) withstack.withStack (8) *errutil.leafError error: timeout 5s: cannot connect to buildkitd

Expected behavior

No response

The docker info output

rika@gult:~$ docker info Client: Docker Engine - Community Version: 24.0.2 Context: desktop-linux Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.10.5 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.13.0 Path: /usr/lib/docker/cli-plugins/docker-compose dev: Docker Dev Environments (Docker Inc.) Version: v0.0.5 Path: /usr/lib/docker/cli-plugins/docker-dev extension: Manages Docker extensions (Docker Inc.) Version: v0.2.16 Path: /usr/lib/docker/cli-plugins/docker-extension sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.) Version: 0.6.0 Path: /usr/lib/docker/cli-plugins/docker-sbom scan: Docker Scan (Docker Inc.) Version: v0.22.0 Path: /usr/lib/docker/cli-plugins/docker-scan

Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 1 Server Version: 20.10.21 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux Default Runtime: runc Init Binary: docker-init containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661 runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.15.49-linuxkit Operating System: Docker Desktop OSType: linux Architecture: x86_64 CPUs: 24 Total Memory: 7.67GiB Name: docker-desktop ID: HGNO:VWSY:NULE:KGPP:JFRZ:CMPX:NV7M:AY6X:2RT4:AWKG:6RH6:U6DM Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128 No Proxy: hubproxy.docker.internal Experimental: false Insecure Registries: hubproxy.docker.internal:5000 127.0.0.0/8 Live Restore Enabled: false

The envd version output

rika@gult:~$ envd version --detail envd: v0.3.36 BuildDate: 2023-07-18T15:36:15Z GitCommit: 854798c5b368505ea81eb040a5584f39eaee1d68 GitTreeState: clean GitTag: v0.3.36 GoVersion: go1.19.10 Compiler: gc Platform: linux/amd64 OSType: linux OSVersion: 22.04 KernelVersion: 5.19.0-50-generic DockerHostVersion: 24.0.2 ContainerRuntimes: [io.containerd.runc.v2,runc] DefaultRuntime: runc

Additional context

No response

gaocegege commented 1 year ago

/assign @kemingy

kemingy commented 1 year ago

Is it related to the sjut mirror?

hakurena commented 1 year ago

Is it related to the sjut mirror?

Nope, i ran envd bootstrap without any subfix at the first attemp and got the same error.

kemingy commented 1 year ago

I'm not able to reproduce the error. Can you check the log in the bulidkitd container?

HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128

Not sure if it's related to the HTTP proxy config.

hakurena commented 1 year ago

I'm not able to reproduce the error. Can you check the log in the bulidkitd container?

HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128

Not sure if it's related to the HTTP proxy config.

No container can be found in my docker desktop. I'm afraid envd failed creating the envd_buildkitd container. Following is the output of docker ps, according to which the only container I have on this PC is the one created by envd 7 monthes ago when I first try to install it...btw, what can i do to check if it's related to the HTTP proxy config?

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a116f6b27925 envd-quick-start:dev "horust" 7 months ago Up 22 hours 127.0.0.1:36629->2222/tcp, 127.0.0.1:41859->8888/tcp envd-quick-start

kemingy commented 1 year ago

The current buildkitd container is started with something like:

docker run --rm --name envd_buildkitd --privileged -v $HOME/.config/envd:/etc/registry docker.io/moby/buildkit:v0.10.6 --config /etc/registry/buildkitd.toml

The $HOME/.config/envd/buildkitd.toml is generated when you run envd bootstrap.

Can you try to run this command directly?

hakurena commented 1 year ago

rika@gult:~$ sudo docker run --rm --name envd_buildkitd --privileged -v $HOME/.config/envd:/etc/registry docker.io/moby/buildkit:v0.10.6 --config /etc/registry/buildkitd.toml

docker: Error response from daemon: Conflict. The container name "/envd_buildkitd" is already in use by container "3323cd6af59ac1e9a1df40d98e9ac0304b4ea67108fe115aee337077ef81812f". You have to remove (or rename) that container to be able to reuse that name.

kemingy commented 1 year ago

rika@gult:~$ sudo docker run --rm --name envd_buildkitd --privileged -v $HOME/.config/envd:/etc/registry docker.io/moby/buildkit:v0.10.6 --config /etc/registry/buildkitd.toml

docker: Error response from daemon: Conflict. The container name "/envd_buildkitd" is already in use by container "3323cd6af59ac1e9a1df40d98e9ac0304b4ea67108fe115aee337077ef81812f". You have to remove (or rename) that container to be able to reuse that name.

Run docker rm envd_buildkitd to remove that old one. Use docker ps -a to check if there are some legacy buildkitd containers.