moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.01k stars 1.12k forks source link

error: failed to solve: failed to read dockerfile: failed to mount lhzivmrs3pheot21kx3b24aix: snapshot lhzivmrs3pheot21kx3b24aix does not exist: not found #2288

Open Qoooooooooooo opened 3 years ago

Qoooooooooooo commented 3 years ago

service start:

Jul 30 18:06:13 hbase systemd[1]: Starting buildkitd.service... Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=warning msg="using host network as the default" Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=info msg="found worker \"2shqc0nug0fd56wyrj8ylc9z7\", labels=map[foo:bar org.mobyproject.buildkit.worker.containerd.namespace:k8s.io org.mobyproject.bui... Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=warning msg="platform linux/arm64 cannot pass the validation, kernel support for miscellaneous binary may have not enabled." Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=info msg="found 1 workers, default=\"2shqc0nug0fd56wyrj8ylc9z7\"" Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=warning msg="currently, only the default worker can be used." Jul 30 18:06:13 hbase buildkitd[11991]: time="2021-07-30T18:06:13+08:00" level=info msg="running server on /run/buildkit/buildkitd.sock" Jul 30 18:06:13 hbase systemd[1]: Started buildkitd.service. Hint: Some lines were ellipsized, use -l to show in full.

buildctl build \ --frontend=dockerfile.v0 \ --local context=. \ --local dockerfile=. \ --output type=image,name=docker.io/username/image:tag

[+] Building 0.0s (1/2)
ERROR [internal] load build definition from Dockerfile 0.0s

[internal] load build definition from Dockerfile:

error: failed to solve: failed to read dockerfile: failed to mount lhzivmrs3pheot21kx3b24aix: snapshot lhzivmrs3pheot21kx3b24aix does not exist: not found

use systemd ExecStart=/usr/local/bin/buildkitd --oci-worker=false --containerd-worker=true

jamescook commented 3 years ago

Seeing the same/similar error occasionally on Buildkit 0.9.0

#1 [internal] load build definition from Dockerfile
#1 sha256:22f92f0378c90a3920b1d29bf5901009012d76f28a9ddc6cb7e61669c2d15904
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:4e6d76c4707252898ade1a212bddb23426d826774aa9fba7c2f18b878a6d2ace
#2 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:4e6d76c4707252898ade1a212bddb23426d826774aa9fba7c2f18b878a6d2ace
#2 ...

#1 [internal] load build definition from Dockerfile
#1 sha256:22f92f0378c90a3920b1d29bf5901009012d76f28a9ddc6cb7e61669c2d15904
#1 transferring dockerfile: 2.73kB done
#1 DONE 0.7s

#2 [internal] load .dockerignore
#2 sha256:4e6d76c4707252898ade1a212bddb23426d826774aa9fba7c2f18b878a6d2ace
#2 transferring context: 2B done
#2 DONE 1.0s

#3 [internal] load metadata for docker.io/library/node:14
#3 sha256:14632244d97b4bf292e1cd4fe957f41b9fab16d8b3cda342bc4c906db56791f6
#3 DONE 0.3s

#5 [base 1/7] FROM docker.io/library/node:14@sha256:cd98882c1093f758d09cf6821dc8f96b241073b38e8ed294ca1f9e484743858f
#5 sha256:dbefa5453b815ffdd5af215f1b20105f038add85eac8265964301efc909575f8
#5 resolve docker.io/library/node:14@sha256:cd98882c1093f758d09cf6821dc8f96b241073b38e8ed294ca1f9e484743858f
#5 ...

#7 [internal] load build context
#7 sha256:682626fd47d224a5e24945dcf398b0f5d0be6dc5ec397846d9cb4bf78c436a9a
#7 DONE 0.0s

#4 importing cache manifest from zzzz/cache:production-zzzz-yyyy-xxxx-master
#4 sha256:d76773a140cae22f8d7e7de82cfa97cde906e7e56f3a126af7bccaeeeea7d50c
#4 DONE 0.2s

#5 [base 1/7] FROM docker.io/library/node:14@sha256:cd98882c1093f758d09cf6821dc8f96b241073b38e8ed294ca1f9e484743858f
#5 sha256:dbefa5453b815ffdd5af215f1b20105f038add85eac8265964301efc909575f8
#5 resolve docker.io/library/node:14@sha256:cd98882c1093f758d09cf6821dc8f96b241073b38e8ed294ca1f9e484743858f 1.6s done
#5 DONE 1.7s

#7 [internal] load build context
#7 sha256:682626fd47d224a5e24945dcf398b0f5d0be6dc5ec397846d9cb4bf78c436a9a
#7 transferring context: 23B
#7 transferring context: 9.93MB 3.6s done
#7 DONE 10.2s
error: failed to solve: rpc error: code = Unknown desc = failed to mount m6q7l7jeufrz1nokxmqi29148: snapshot m6q7l7jeufrz1nokxmqi29148 does not exist: not found
tonistiigi commented 3 years ago

@sipsma Any ideas?

sipsma commented 3 years ago

@sundong1982 @jamescook can you share any code that reproduces the error? Or at least more details on when it happens?

The mount error is coming from here. The Dockerfile error is happening here.

I looked into whether the change in solver/llbsolver/bridge.go to no longer call Finalize could be related as it gets hit when the Dockerfile is being read. The call to Finalize itself was a no-op due to the bug present in previous code, so I don't believe it could be a direct cause of these new errors, but I'm wondering if it could have had a side-effect by waiting for the cache record mutex to be locked, which may have inadvertently prevented a different race condition that has now been revealed. This is just speculation though, can't say for sure until it's reproducible.

jamescook commented 3 years ago

@sipsma I haven't noticed this particular error in the past week, but we get either buildkit errors like this issue and https://github.com/moby/buildkit/issues/2088 or crashes (https://github.com/moby/buildkit/issues/2296 and https://github.com/moby/buildkit/issues/2303) almost daily. We're building many images in parallel as part of our CI pipeline. The pipeline is building 4 docker images per architecture (amd64 and arm64) via buildctl over two buildkitd workers (1 for each architecture). Problems occur on either architecture.

My general observation is that the errors typically happen when there are multiple branches being built at once in the CI pipeline. I'm not sure what code I can/should share - can you clarify what you're looking for? Thanks for reaching out.

markmandel commented 2 years ago

Just here to say that I'm seeing a very similar issue. We are also building amd64, arm64 and windows images concurrently, and get the following issues: on Docker 20.x.x (doesn't seem to be an issue on Docker 19.x.x)

DOCKER_CLI_EXPERIMENTAL=enabled docker buildx build --platform windows/amd64 -f /home/mark/workspace/agones/cmd/sdk-server/Dockerfile.windows --tag=us-docker.pkg.dev/agones-mark-dev/images/agones-sdk:1.22.0-3f00aa6-windows_amd64-ltsc2019 --build-arg WINDOWS_VERSION=ltsc2019  /home/mark/workspace/agones/cmd/sdk-server/
WARNING: No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 10.2s (2/3)
 => [internal] booting buildkit                                                                                                                9.7s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                             8.8s
 => => creating container buildx_buildkit_windows-builder0                                                                                     0.9s
 => ERROR [internal] load build definition from Dockerfile.windows                                                                             0.0s
------
 > [internal] load build definition from Dockerfile.windows:
------
error: failed to solve: failed to read dockerfile: snapshot  does not exist: not found
make: *** [Makefile:442: build-agones-sdk-image-windows-ltsc2019] Error 1

If you want to attempt to replicate, grab a copy of https://github.com/googleforgames/agones and go to the build folder and run make -j 4 build-images, and it should fail for you. (it passes once the images are cached though).

Happy to share more info if useful.

rcmelendez commented 2 years ago

Sharing my solution if someone experiences the same error.

I noticed that my buildkit image was a bit outdated, so I just pulled the latest version (docker pull moby/buildkit:buildx-stable-1). Then I reran my build process and it worked!