Closed mark-5 closed 1 year ago
#3431 appears to have reintroduced 1146, and buildkit is failing to release SSH agent connections until after sshforward.Copy has finished.
sshforward.Copy
#3506 did not fix this, because as noted, sshprovider.ForwardAgent uses an internal wrapper that doesn't implement CloseWrite.
sshprovider.ForwardAgent
CloseWrite
Environment:
macOS Ventura 13.3.1 (in moby/buildkit:local-rootless build image)
/ $ buildkitd --version buildkitd github.com/moby/buildkit v0.11.0-rc3-470-g9d357745 9d357745feb0dfb166758b8e6632a7a875919bc3 / $ buildctl --version buildctl github.com/moby/buildkit v0.11.0-rc3-470-g9d357745 9d357745feb0dfb166758b8e6632a7a875919bc3 / $ ssh -V OpenSSH_9.1p1, OpenSSL 3.0.8 7 Feb 2023 / $ uname -a Linux b6ba5493f1fa 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 Linux / $ cat /etc/os-release NAME="Alpine Linux" ID=alpine VERSION_ID=3.17.3 PRETTY_NAME="Alpine Linux v3.17" HOME_URL="https://alpinelinux.org/" BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
To reproduce the issue you can
buildctl build
Here's buildctl build:
~ $ cat Dockerfile FROM alpine RUN --mount=type=ssh apk update \ && apk add openssh-client-default \ && mkdir -p -m 0600 ~/.ssh \ && ssh-keyscan github.com >> ~/.ssh/known_hosts \ && for i in $(seq 1 3); do \ ssh -T git@github.com; \ done; \ sleep 10; ~ $ buildctl build --progress plain --frontend dockerfile.v0 --local context=. - -local dockerfile=. --output type=tar,dest=/dev/null --ssh default --no-cache #1 [internal] load build definition from Dockerfile #1 transferring dockerfile: 292B done #1 DONE 0.0s #2 [internal] load metadata for docker.io/library/alpine:latest #2 DONE 0.9s #3 [internal] load .dockerignore #3 transferring context: 2B done #3 DONE 0.0s #4 [stage-0 1/2] FROM docker.io/library/alpine@sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126 #4 resolve docker.io/library/alpine@sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126 done #4 sha256:c41833b44d910632b415cd89a9cdaa4d62c9725dc56c99a7ddadafd6719960f9 0B / 3.26MB 0.2s #4 sha256:c41833b44d910632b415cd89a9cdaa4d62c9725dc56c99a7ddadafd6719960f9 3.26MB / 3.26MB 0.4s done #4 extracting sha256:c41833b44d910632b415cd89a9cdaa4d62c9725dc56c99a7ddadafd6719960f9 0.1s done #4 DONE 0.5s #5 [stage-0 2/2] RUN --mount=type=ssh apk update && apk add openssh-client-default && mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts && for i in $(seq 1 3); do ssh -T git@github.com; done; sleep 10; #0 0.083 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/aarch64/APKINDEX.tar.gz #5 0.535 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/aarch64/APKINDEX.tar.gz #5 0.958 v3.17.3-147-gc4fb7618739 [https://dl-cdn.alpinelinux.org/alpine/v3.17/main] #5 0.958 v3.17.3-150-g717cba89b75 [https://dl-cdn.alpinelinux.org/alpine/v3.17/community] #5 0.958 OK: 17691 distinct packages available #5 1.113 (1/6) Installing openssh-keygen (9.1_p1-r2) #5 1.360 (2/6) Installing ncurses-terminfo-base (6.3_p20221119-r0) #5 1.402 (3/6) Installing ncurses-libs (6.3_p20221119-r0) #5 1.443 (4/6) Installing libedit (20221030.3.1-r0) #5 1.471 (5/6) Installing openssh-client-common (9.1_p1-r2) #5 1.556 (6/6) Installing openssh-client-default (9.1_p1-r2) #5 1.621 Executing busybox-1.35.0-r29.trigger #5 1.630 OK: 13 MiB in 21 packages #5 1.786 # github.com:22 SSH-2.0-babeld-aebb503c #5 1.830 # github.com:22 SSH-2.0-babeld-aebb503c #5 1.897 # github.com:22 SSH-2.0-babeld-aebb503c #5 1.962 # github.com:22 SSH-2.0-babeld-aebb503c #5 1.986 # github.com:22 SSH-2.0-babeld-aebb503c #5 2.258 git@github.com: Permission denied (publickey). #5 2.556 git@github.com: Permission denied (publickey). #5 2.801 git@github.com: Permission denied (publickey). #5 DONE 12.9s #6 exporting to client tarball #6 sending tarball 0.1s done #6 DONE 0.1s
And here's the ssh-agent debug output, showing the file descriptors leak:
~ $ ssh-agent -s -d -a /tmp/authsock SSH_AUTH_SOCK=/tmp/authsock; export SSH_AUTH_SOCK; echo Agent pid 111; debug1: new_socket: type = SOCKET debug2: fd 3 setting O_NONBLOCK debug1: new_socket: type = CONNECTION debug2: fd 4 setting O_NONBLOCK debug1: process_message: socket 1 (fd=4) type 11 debug2: process_request_identities: entering debug2: process_request_identities: replying with 0 allowed of 0 available keys debug1: new_socket: type = CONNECTION debug2: fd 5 setting O_NONBLOCK debug1: process_message: socket 2 (fd=5) type 11 debug2: process_request_identities: entering debug2: process_request_identities: replying with 0 allowed of 0 available keys debug1: new_socket: type = CONNECTION debug2: fd 6 setting O_NONBLOCK debug1: process_message: socket 3 (fd=6) type 11 debug2: process_request_identities: entering debug2: process_request_identities: replying with 0 allowed of 0 available keys
cc @sipsma
Thanks for reporting @mark-5, I will take a look this weekend and include a test with the fix so it doesn't regress again.
PR here: https://github.com/moby/buildkit/pull/3848
#3431 appears to have reintroduced 1146, and buildkit is failing to release SSH agent connections until after
sshforward.Copy
has finished.#3506 did not fix this, because as noted,
sshprovider.ForwardAgent
uses an internal wrapper that doesn't implementCloseWrite
.Environment:
macOS Ventura 13.3.1 (in moby/buildkit:local-rootless build image)
To reproduce the issue you can
buildctl build
with the Dockerfile from 1146Here's
buildctl build
:Here's
buildctl build
:And here's the ssh-agent debug output, showing the file descriptors leak: