moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.24k stars 1.17k forks source link

[BUG] Docker Compose V2 build stuck forever on Windows 10 #3571

Open sergey-morenets opened 1 year ago

sergey-morenets commented 1 year ago

Hi

Original issue is here: https://github.com/docker/compose/issues/10229

We have very simple Docker Compose configuration that was built successfully until last upgrade to Docker 4.16.3. Docker compose built stuck forever so we have to restart computer and sometime remove empty meta.json file so that Docker Compose start working properly. The last lines from docker-compose build command output are:

exporting to docker image format
sending tarball

Here's logs from \docker-desktop-data\data\docker\containers\

{"log":"time=\"2023-02-02T09:40:22Z\" level=info msg=\"auto snapshotter: using overlayfs\"\n","stream":"stderr","time":"2023-02-02T09:40:22.9837146Z"}
{"log":"time=\"2023-02-02T09:40:22Z\" level=warning msg=\"using host network as the default\"\n","stream":"stderr","time":"2023-02-02T09:40:22.9840142Z"}
{"log":"time=\"2023-02-02T09:40:23Z\" level=info msg=\"found worker \\\"6j5qk61a44tlxwbtilho9kq7k\\\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:1a1de79a9a18 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/amd64/v2 linux/amd64/v3 linux/arm64 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]\"\n","stream":"stderr","time":"2023-02-02T09:40:23.0053227Z"}
{"log":"time=\"2023-02-02T09:40:23Z\" level=warning msg=\"skipping containerd worker, as \\\"/run/containerd/containerd.sock\\\" does not exist\"\n","stream":"stderr","time":"2023-02-02T09:40:23.0230857Z"}
{"log":"time=\"2023-02-02T09:40:23Z\" level=info msg=\"found 1 workers, default=\\\"6j5qk61a44tlxwbtilho9kq7k\\\"\"\n","stream":"stderr","time":"2023-02-02T09:40:23.023096Z"}
{"log":"time=\"2023-02-02T09:40:23Z\" level=warning msg=\"currently, only the default worker can be used.\"\n","stream":"stderr","time":"2023-02-02T09:40:23.0230991Z"}
{"log":"time=\"2023-02-02T09:40:23Z\" level=info msg=\"running server on /run/buildkit/buildkitd.sock\"\n","stream":"stderr","time":"2023-02-02T09:40:23.0272842Z"}
{"log":"time=\"2023-02-02T09:40:58Z\" level=warning msg=\"healthcheck failed\" actualDuration=30.0016861s spanID=75bd469047542c05 timeout=30s traceID=52e529792d9014728d75ee03a49545f2\n","stream":"stderr","time":"2023-02-02T09:40:58.2677984Z"}
{"log":"time=\"2023-02-02T09:41:43Z\" level=error msg=\"healthcheck failed fatally\" spanID=75bd469047542c05 traceID=52e529792d9014728d75ee03a49545f2\n","stream":"stderr","time":"2023-02-02T09:41:43.2710605Z"}
{"log":"time=\"2023-02-02T09:41:43Z\" level=error msg=\"/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = failed to copy to tar: rpc error: code = Canceled desc = grpc: the client connection is closing\"\n","stream":"stderr","time":"2023-02-02T09:41:43.3244673Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=info msg=\"auto snapshotter: using overlayfs\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1314603Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=warning msg=\"using host network as the default\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1317857Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=info msg=\"found worker \\\"6j5qk61a44tlxwbtilho9kq7k\\\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:1a1de79a9a18 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/amd64/v2 linux/amd64/v3 linux/arm64 linux/riscv64 linux/ppc64le linux/s390x linux/386 linux/mips64le linux/mips64 linux/arm/v7 linux/arm/v6]\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1665478Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=warning msg=\"skipping containerd worker, as \\\"/run/containerd/containerd.sock\\\" does not exist\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1850306Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=info msg=\"found 1 workers, default=\\\"6j5qk61a44tlxwbtilho9kq7k\\\"\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1850409Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=warning msg=\"currently, only the default worker can be used.\"\n","stream":"stderr","time":"2023-02-02T09:49:32.1850431Z"}
{"log":"time=\"2023-02-02T09:49:32Z\" level=info msg=\"running server on /run/buildkit/buildkitd.sock\"\n","stream":"stderr","time":"2023-02-02T09:49:32.190224Z"}
{"log":"time=\"2023-02-02T09:50:52Z\" level=warning msg=\"healthcheck failed\" actualDuration=30.0118699s spanID=691b5971458139c1 timeout=30s traceID=01eb75694fd74c3cb829c0ad33782082\n","stream":"stderr","time":"2023-02-02T09:50:52.1106941Z"}
{"log":"time=\"2023-02-02T09:51:37Z\" level=error msg=\"healthcheck failed fatally\" spanID=691b5971458139c1 traceID=01eb75694fd74c3cb829c0ad33782082\n","stream":"stderr","time":"2023-02-02T09:51:37.1293957Z"}
{"log":"time=\"2023-02-02T09:51:37Z\" level=error msg=\"/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = failed to copy to tar: rpc error: code = Canceled desc = grpc: the client connection is closing\"\n","stream":"stderr","time":"2023-02-02T09:51:37.1719046Z"}

Steps To Reproduce

Here's docker-compose.yml:

---
version: '3.8'
services:
  kafka:
    build: 
      context: .

And Dockerfile:

FROM confluentinc/cp-kafka:7.3.1

USER root

RUN echo "kafka-storage format --ignore-formatted -t $(kafka-storage random-uuid) -c /etc/kafka/kafka.properties" >> /etc/confluent/docker/ensure

We run command: "docker-compose build" or "docker compose build". Interesting is if we remove "USER root" line then build is run with success. Also if we just execute "docker build ." command then built is also executed with success.

Compose Version

2.15.1

Docker Environment

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.10.0)
  compose: Docker Compose (Docker Inc., v2.15.1)
  dev: Docker Dev Environments (Docker Inc., v0.0.5)
  extension: Manages Docker extensions (Docker Inc., v0.2.17)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.23.0)

Server:
 Containers: 10
  Running: 1
  Paused: 0
  Stopped: 9
 Images: 16
 Server Version: 20.10.22
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9ba4b250366a5ddde94bb7c9d1def331423aa323
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.72-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 25GiB
 Name: docker-desktop
 ID: IIT3:Q5TA:JKXM:A4R3:E7AA:TCQW:PMEA:QYVB:BG57:VXSS:Q6FT:PII4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false
trueuto commented 1 year ago

Hi,

I think this is not only related to Windows. I have a docker-compose project on Linux system which fails to build as well. In my case it seems that it is because I had a buildx configured earlier (i.e. for docker version <23.0.0) and could use it with docker buildx build command. With the latest docker release (23.0.0) buildx became default builder and this apparently affects compose as well:

$ docker compose build
[+] Building 0.0s (0/0)
no valid drivers found: error during connect: Get "http://docker.example.com/v1.24/info": command [ssh -- 192.168.40.8 docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=ssh: connect to host 192.168.40.8 port 22: Connection timed out

The 192.168.40.8 is one of my multi-arch docker VMs for multi-arch builds using buildx and it was powered off at the time.

With the VM up and running container image gets built but it is done on the remote machine and transferred back to the system where docker compose build has been called.

I can revert the old behaviour by simply setting DOCKER_BUILDKIT=0 though, but maybe there is some other, cleaner way.

Please advise if this should be reported as a separate case (or it if should be considered a 'bug' at all).

Edit: It was enough to change the default context for the buildx - something that I did not need to do in the past: docker buildx use default

default-value-serhiia commented 1 year ago

The same issue was faced after a clean Ubuntu setup on two different devices at the same time.

pbouill commented 1 year ago

same issue using docker-compose with podman on windows/WSL2 when DOCKER_BUILDKIT=1 (buildx)