moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.27k stars 1.17k forks source link

buildkit fails with "mount already exists with name" #1460

Open kit-ty-kate opened 4 years ago

kit-ty-kate commented 4 years ago

With a similar Dockerfile to https://github.com/moby/buildkit/issues/1459#issuecomment-618690727:

# syntax = docker/dockerfile:1.1.5-experimental
FROM ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
COPY --chown=opam . /src/
RUN git -C /src checkout -b opam-ci__cibranch a1f2c8222b2a755968b2212471fd1732bdaf6f3d && \
  git -C /src merge 49560cde56610f9fd03c0af5ef852e9b873ecf6c && \
  opam repository set-url --strict default file:///src
RUN opam pin add -k version -yn conf-nauty.1.1 1.1
RUN --mount=type=cache,target=/home/opam/.opam/download-cache,uid=1000 opam depext -uivy conf-nauty.1.1

our docker build sometimes fails with:

#1 [internal] load .dockerignore
#1 sha256:95b7a97096dc2afd2d43c2337088551dcb79f6b7cd7780992776af1afaca6f81
#1 transferring context:
#1 transferring context: 2B done
#1 DONE 0.6s

#2 [internal] load build definition from Dockerfile
#2 sha256:86eb8a668e7aabc92f8c971f945dc6e33bb4dafb625349e0e4b3e99e34f50a19
#2 transferring dockerfile: 572B done
#2 DONE 0.9s

#3 resolve image config for docker.io/docker/dockerfile:1.1.5-experimental
#3 sha256:4d904ca4dadcfe285b7e07c14c496e2212a754f0c4fd12946baa040350cae4f0
#3 DONE 0.9s

#4 docker-image://docker.io/docker/dockerfile:1.1.5-experimental@sha256:787107d7f7953cb2d95ee81cc7332d79aca9328129318e08fc7ffbd252a20656
#4 sha256:d2736ec635330516879555dba7aa00a3afb686946df7f09d957dc33a05f95b62
#4 CACHED

#5 [internal] load metadata for docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#5 sha256:c3d18c31ea0787222f2fb8d1ea33b95119a8b415e570a2d026f6aea36d6ff8b9
#5 DONE 0.0s

#11 [internal] settings cache mount permissions
#11 sha256:98e142695adfe60b7bd7795f3db2122cdc761dcae59a13eab856140b08987ce3
#11 DONE 0.0s

#6 [stage-0 1/5] FROM docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#6 sha256:16da8708e67b40ea0ed8ecd0041c80825c6c6966930c7a99e397d1a2a128b466
#6 DONE 0.0s

#7 [internal] load build context
#7 sha256:cf9ebfd4c665038fcc01c0f0427a7e3c4451e7b6dec9591b287b850ddb7a0761
#7 transferring context: 104.60MB 2.6s
#7 transferring context: 111.47MB 3.5s done
#7 DONE 4.6s

#6 [stage-0 1/5] FROM docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#6 sha256:16da8708e67b40ea0ed8ecd0041c80825c6c6966930c7a99e397d1a2a128b466
#6 CACHED

#11 [internal] settings cache mount permissions
#11 sha256:98e142695adfe60b7bd7795f3db2122cdc761dcae59a13eab856140b08987ce3
#11 CACHED

#8 [stage-0 2/5] COPY --chown=opam . /src/
#8 sha256:84fcec75053eb6d4d45b6fb8bb78931ad7518971d40c96bfcc6fe7c2b7a09cd0
#8 ERROR: mount already exists with name
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name

I've tried several hours to get a minimal test-case without success, however it still happens pretty often on our test server.

Here is our daemon log:

Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963833995Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963920949Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963962093Z" level=info msg="ccResolverWrapper: sending update to cc: {[{ 0  <nil>}] <nil>}" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963972759Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 24 00:26:30 m1-d dockerd[37339]: time="2020-04-24T00:26:30Z" level=error msg="fatal error: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name"
Apr 24 00:26:30 m1-d dockerd[37339]: panic: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name
Apr 24 00:26:30 m1-d dockerd[37339]: goroutine 1 [running]:
Apr 24 00:26:30 m1-d dockerd[37339]: main.main()
Apr 24 00:26:30 m1-d dockerd[37339]:         /src/frontend/dockerfile/cmd/dockerfile-frontend/main.go:26 +0x34f
Apr 24 00:26:31 m1-d dockerd[37339]: time="2020-04-24T00:26:31.282392042Z" level=warning msg="grpc: addrConn.createTransport failed to connect to { 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing only one connection allowed\". Reconnecting..." module=grpc
tonistiigi commented 4 years ago

What version/software this is?

kit-ty-kate commented 4 years ago
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
dotcarls commented 4 years ago

We see this sporadically in our CI pipeline -- we have make kick off several parallel builds and every once in a while the job will fail with the "mount already exists with name" error:

#8 [build 2/4] WORKDIR /src/api
#8 ERROR: mount already exists with name

#9 [internal] load build context
#9 transferring context: 7.09MB 0.1s done
#9 CANCELED
------
 > [build 2/4] WORKDIR /src/api:
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name
Makefile:309: recipe for target 'docker-build-swagger' failed
make: *** [docker-build-swagger] Error 1
make: *** Waiting for unfinished jobs....

Could it be because the stages are aliased with the same name? The Dockerfile's that are built at that time both have statements like:

FROM <our-common-build-image>:latest AS build
max-sixty commented 3 years ago

FYI we hit this on docker:20.10.2. We kick off two builds using the same docker-dind service using K8s on GitLab CI. Those builds have lots of similarities, including overlapping build stage names.

lawnmowerlatte commented 3 years ago

We see this in CI as well when using make to parallelize multiple multistage builds that all have the same FROM imagename as builder. I'm wondering if @dotcarls might be on to something.

Stono commented 3 years ago

Yeah we just hit this too:


#9 [stage-1 2/4] COPY --chown=atcloud:atcloud .npmrc package.json /usr/local/autotrader/app/
#9 sha256:be4cfa891678f62a7c246fdab682b5253f37aeb2b9b4b0c0640aa38d3fff47c2
#9 ERROR: mount already exists with name
------
 > [build 2/6] COPY --chown=atcloud:atcloud package.json .npmrc /usr/local/autotrader/app/:
------
------
 > [stage-1 2/4] COPY --chown=atcloud:atcloud .npmrc package.json /usr/local/autotrader/app/:
------
mount already exists with name
Service 'app' failed to build : Build failed
your build failed! check the logs above to understand the reason, you can recreate locally by running 'docker-compose build'

Two layers from separate stages copying files from the host, and similar to the above, both of our layers are FROM the same image.

bash-5.1# docker version
Client:
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:50:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:55:09 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b638
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
lawnmowerlatte commented 3 years ago

We see this in CI as well when using make to parallelize multiple multistage builds that all have the same FROM imagename as builder. I'm wondering if @dotcarls might be on to something.

This seems to be a dead end. We renamed our multistage names to be unique for each parallelized (ie. FROM image name as builder_for_app1, FROM imagename as builder_for_app2, etc.) and still saw this issue.

janhn commented 3 years ago

Seeing this with Docker version 20.10.11, on a gitlab-runner host for CI pipeline.

How can we investigate to understand what's going on?

carsten-langer commented 1 year ago

We also see this problem with docker 23.01. It appears when we run 2 simultaneous docker build commands with the same FROM image and the same layers up to the first of the COPY commands, like in the example of user Stono at https://github.com/moby/buildkit/issues/1460#issuecomment-938536403

We were able to fix this by building with "--no-cache", accepting the downsides of this approach.

This leads to the following hypothesis on the root cause:

FROM myimage:mytag
COPY a/ /a/
COPY b/ /b/
cpuguy83 commented 1 year ago

Looking through the code where this error could occur, the only place this occurs is here:

https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/layer/layer_store.go#L480-L501

This has 2 callers (for build): https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/builder/builder-next/adapters/snapshot/snapshot.go#L280-L285

https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/daemon/images/image_builder.go#L62-L63

The "name" here is always a random ID generated by crypto/rand.