vmware-archive / buildkit-cli-for-kubectl

BuildKit CLI for kubectl is a tool for building container images with your Kubernetes cluster
Other
499 stars 41 forks source link

kubectl build gets stuck indefinitely #109

Open nicks opened 2 years ago

nicks commented 2 years ago

What steps did you take and what happened

Sometimes kubectl build gets stuck and never recovers.

As far as I can tell, what's happening is that it's waiting indefinitely for the buildkit server to deploy.

What did you expect to happen

At the bare minimum, the build should timeout and fail if it takes too long for the buildkit server to start.

Environment Details:

$ kubectl buildkit version
Client:  v0.1.4
Builder: buildkitd github.com/moby/buildkit v0.9.2 a14b4e097ae1dc7514c5febd6d75f742a166ea75

Builder Logs

https://app.circleci.com/pipelines/github/tilt-dev/tilt-example-builders/13/workflows/a73afc07-73e5-4f38-9419-f944f2844b54/jobs/14

Here are the relevant bits:

example-pyth… │ Running custom build cmd "kubectl build --context kind-kind -f ./Dockerfile --registry-secret docker-registry --build-arg flask_env=development -t $EXPECTED_REF ."
example-pyth… │ #1 [internal] booting buildkit
example-pyth… │ #1 waiting for 1 pods to be ready for buildkit
example-pyth… │ #1 0.320 Normal     buildkit-7c58458f5f     SuccessfulCreate    Created pod: buildkit-7c58458f5f-zlf6b
example-pyth… │ #1 0.411 Normal     buildkit-7c58458f5f-zlf6b   Scheduled   Successfully assigned default/buildkit-7c58458f5f-zlf6b to kind-control-plane
example-pyth… │ #1 0.411 Warning    buildkit-7c58458f5f-zlf6b   FailedMount     MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
example-pyth… │ #1 0.411 Warning    initial attempt to deploy configured for the docker runtime failed, retrying with containerd
example-pyth… │ #1 123.5 Warning    buildkit-7c58458f5f-zlf6b   FailedMount     Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[buildkitd-config docker-sock kube-api-access-wd96c]: timed out waiting for the condition
example-pyth… │ #1 397.9 Warning    buildkit-7c58458f5f-zlf6b   FailedMount     Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[kube-api-access-wd96c buildkitd-config docker-sock]: timed out waiting for the condition
example-pyth… │ #1 534.7 Warning    buildkit-7c58458f5f-zlf6b   FailedMount     Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[docker-sock kube-api-access-wd96c buildkitd-config]: timed out waiting for the condition
Error: context canceled

Too long with no output (exceeded 10m0s): context deadline exceeded

Note that it's eventually getting killed by CircleCI

Dockerfile

https://github.com/tilt-dev/tilt-example-builders/blob/main/kubectl_build/Dockerfile

Vote on this request

This is an invitation to the community to vote on issues. Use the "smiley face" up to the right of this comment to vote.

nicks commented 2 years ago

anecdotally: if i change my setup script to do

kubectl buildkit create --runtime=containerd

rather than relying on the auto-setup and runtime=auto detection, it seems to work consistently. (though the fundamental bug seems to be a race condition, so i might just be getting lucky.)

dhiltgen commented 2 years ago

PR #106 revamps some of the relevant code paths, which hopefully will improve or fix this. It may take us a little while to get that merged, but once we do, we'll most likely cut a new minor release (v0.2.0) which we can then re-test in your scenario to see if it fully solves the problem, improves it, or makes no difference.

nicks commented 2 years ago

Thanks @dhiltgen ! We'll test out v0.2.0 once it's out to see if it helps.

tmc commented 2 years ago

@nicks curious if you had a chance to eval that release?

nicks commented 2 years ago

@tmc i don't think v0.2 is out yet? but the kubectl buildkit create --runtime=containerd workaround has been working well for us, we haven't seen a failure since we added it