pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.27k stars 626 forks source link

use_buildx on macos results in failure to package/build the image. #20605

Open cjim8889 opened 6 months ago

cjim8889 commented 6 months ago

Describe the bug use_buildx on macos results in failure to package/build the image due to some misplaced flags?

Pants version both 2.19 and 2.20

OS MacOS

Additional info

23:30:29.01 [DEBUG] spawned local process as Some(45339) for Process { argv: ["/usr/local/bin/docker", "buildx", "build", "--output=type=docker", "--pull=False", "--tag", "europe-west2-docker.pkg.dev/spacedevice/spacedevice-dev/executor/img:latest", "--file", "core/executor/deploy/Dockerfile.img", "."], env: {"PATH": "/private/var/folders/yz/mym190td36z0fdmmg50ws0940000gn/T/pants-sandbox-zm8ayi/_binary_shims_b2c0bc85d922cd220fbeb3b9cf6edcd5e5da71438f0d44c9c22dbadc1c947cd1", "__UPSTREAM_IMAGE_IDS": ""}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<01a4c6b0b15421afe2e92514530b3caefe871ee604014d28e14e6859df9315bf>, size_bytes: 323 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, inputs: DirectoryDigest { digest: Digest { hash: Fingerprint<bd9197a6763015360a29b6d3b7c696ae61d0a2a65494b449eecbb0d9a22eef8b>, size_bytes: 169 }, tree: "Some(..)" }, immutable_inputs: {RelativePath("_binary_shims_b2c0bc85d922cd220fbeb3b9cf6edcd5e5da71438f0d44c9c22dbadc1c947cd1"): DirectoryDigest { digest: Digest { hash: Fingerprint<b2c0bc85d922cd220fbeb3b9cf6edcd5e5da71438f0d44c9c22dbadc1c947cd1>, size_bytes: 588 }, tree: "Some(..)" }}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Building docker image europe-west2-docker.pkg.dev/spacedevice/spacedevice-dev/executor/img:latest", level: Info, append_only_caches: {}, jdk_home: None, cache_scope: PerSession, execution_environment: ProcessExecutionEnvironment { name: Some("macos"), platform: Macos_arm64, strategy: Local }, remote_cache_speculation_delay: 0ns, attempt: 0 }
23:30:29.03 [INFO] Completed: Building docker image europe-west2-docker.pkg.dev/spacedevice/spacedevice-dev/executor/img:latest
23:30:29.03 [DEBUG] Completed: Scheduling: Building docker image europe-west2-docker.pkg.dev/spacedevice/spacedevice-dev/executor/img:latest
23:30:29.03 [DEBUG] Completed: `package` goal
23:30:29.03 [DEBUG] computed 1 nodes in 29.739843 seconds. there are 14019 total nodes.
23:30:29.03 [ERROR] 1 Exception encountered:

Engine traceback:
  in root
    ..
  in pants.core.goals.package.package_asset
    `package` goal

Traceback (most recent call last):
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/core/goals/package.py", line 165, in package_asset
    packages = await MultiGet(
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 376, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 174, in __await__
    result = yield self.gets
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/core/goals/package.py", line 116, in environment_aware_package
    package = await Get(
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 124, in __await__
    result = yield self
  File "/Users/chenwuhao/Library/Caches/nce/0bb9722222f003de9629623179038ecab5b6bd747796b95272e1d271dba00578/bindings/venvs/2.20.0a0/lib/python3.9/site-packages/pants/backend/docker/goals/package_image.py", line 470, in build_docker_image
    raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Building docker image europe-west2-docker.pkg.dev/spacedevice/spacedevice-dev/executor/img:latest' failed with exit code 125.
stdout:

stderr:
unknown flag: --output
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Common Commands:
  run         Create and run a new container from an image
  exec        Execute a command in a running container
  ps          List containers
  build       Build an image from a Dockerfile
  pull        Download an image from a registry
  push        Upload an image to a registry
  images      List images
  login       Log in to a registry
  logout      Log out from a registry
  search      Search Docker Hub for images
  version     Show the Docker version information
  info        Display system-wide information

Management Commands:
  builder     Manage builds
  container   Manage containers
  context     Manage contexts
  image       Manage images
  manifest    Manage Docker image manifests and manifest lists
  network     Manage networks
  plugin      Manage plugins
  system      Manage Docker
  trust       Manage trust on Docker images
  volume      Manage volumes

Swarm Commands:
  swarm       Manage Swarm

Commands:
  attach      Attach local standard input, output, and error streams to a running container
  commit      Create a new image from a container's changes
  cp          Copy files/folders between a container and the local filesystem
  create      Create a new container
  diff        Inspect changes to files or directories on a container's filesystem
  events      Get real time events from the server
  export      Export a container's filesystem as a tar archive
  history     Show the history of an image
  import      Import the contents from a tarball to create a filesystem image
  inspect     Return low-level information on Docker objects
  kill        Kill one or more running containers
  load        Load an image from a tar archive or STDIN
  logs        Fetch the logs of a container
  pause       Pause all processes within one or more containers
  port        List port mappings or a specific mapping for the container
  rename      Rename a container
  restart     Restart one or more containers
  rm          Remove one or more containers
  rmi         Remove one or more images
  save        Save one or more images to a tar archive (streamed to STDOUT by default)
  start       Start one or more stopped containers
  stats       Display a live stream of container(s) resource usage statistics
  stop        Stop one or more running containers
  tag         Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
  top         Display the running processes of a container
  unpause     Unpause all processes within one or more containers
  update      Update configuration of one or more containers
  wait        Block until one or more containers stop, then print their exit codes

Global Options:
      --config string      Location of client config files (default ".docker")
  -c, --context string     Name of the context to use to connect to the
                           daemon (overrides DOCKER_HOST env var and
                           default context set with "docker context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket to connect to
  -l, --log-level string   Set the logging level ("debug", "info",
                           "warn", "error", "fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default
                           ".docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default
                           ".docker/cert.pem")
      --tlskey string      Path to TLS key file (default ".docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            Print version information and quit

Run 'docker COMMAND --help' for more information on a command.

For more help on how to use Docker, head to https://docs.docker.com/go/guides/

Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

23:30:29.03 [DEBUG] waiting for 173 session end task(s) to complete
23:30:32.03 [DEBUG] 1 session end task(s) failed to complete within timeout: remote cache write Digest { hash: Fingerprint<9363f653ed190c31371c16fb29936b4593f6cdaa589a0f586e1743116dac9ddc>, size_bytes: 142 }
cjim8889 commented 6 months ago

The issue can be resolved by turning off the buildx, but I don't think this is a desirable state. So, I still chose to report it here.

cjim8889 commented 6 months ago

My docker env:

Client:
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:26 2024
 OS/Arch:           darwin/arm64
 Context:           orbstack
huonw commented 6 months ago

Thanks for taking the time to file an issue, and sorry for the trouble.

I tried setting use_buildx = true in a repo I have that uses docker_image targets and Pants 2.19.0, and couldn't reproduce the issue (the builds still worked fine on my ARM mac). My context:

Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.2
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Thu Feb  1 00:18:45 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.27.1 (136059)
 Engine:
  Version:          25.0
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0ca9bc000888de3daa157af14fa41fcd0ff
  Built:            Thu Feb  1 00:15:46 2024
  OS/Arch:          linux/arm64
  Experimental:     true
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Can you share more info about your pants.toml and the target(s) in your BUILD files? If possible, a reduced reproducer repository would be best. Thanks!

avilaton commented 6 months ago

I have the same issue so I went diving into pants code. I have a hunch that this is something broken with how the option for --output is being passed to a subprocess.

Here is what happens with buildx

10:39:39.46 [DEBUG] spawned local process as Some(73254) for Process { argv: ["/Users/gaston/.rd/bin/docker", "buildx", "build", "--output=type=docker", "--pull=False", "--tag", "ghcr.io/autoprotect-ai/mock-server:latest", "--file", "mock-server/Dockerfile.docker", "."], env: {"__UPSTREAM_IMAGE_IDS": ""}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<19e54847a54e4c6c3018cbb9ea2b4eb0c4e0b0d56e8ca1c0529a3fdb617cb2c0>, size_bytes: 86 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, inputs: DirectoryDigest { digest: Digest { hash: Fingerprint<19e54847a54e4c6c3018cbb9ea2b4eb0c4e0b0d56e8ca1c0529a3fdb617cb2c0>, size_bytes: 86 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Building docker image ghcr.io/autoprotect-ai/mock-server:latest", level: Info, append_only_caches: {}, jdk_home: None, cache_scope: PerSession, execution_environment: ProcessExecutionEnvironment { name: Some("osx"), platform: Macos_arm64, strategy: Local }, remote_cache_speculation_delay: 0ns, attempt: 0 }
10:39:39.48 [INFO] Preserving local process execution dir /private/var/folders/pq/fdztgvgx6z17_vsc26lg0_2m0000gn/T/pants-sandbox-vMxrC1 for Building docker image ghcr.io/autoprotect-ai/mock-server:latest
10:39:39.48 [INFO] Completed: Building docker image ghcr.io/autoprotect-ai/mock-server:latest
10:39:39.48 [DEBUG] Completed: Scheduling: Building docker image ghcr.io/autoprotect-ai/mock-server:latest
10:39:39.48 [DEBUG] Completed: `package` goal
10:39:39.48 [DEBUG] computed 1 nodes in 2.299113 seconds. there are 7156 total nodes.
10:39:39.48 [ERROR] 1 Exception encountered:

Engine traceback:
  in `package` goal

ProcessExecutionFailure: Process 'Building docker image ghcr.io/autoprotect-ai/mock-server:latest' failed with exit code 125.
stdout:

stderr:
unknown flag: --output
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers
...

But entering that sandbox and running the command works:

> cd /private/var/folders/pq/fdztgvgx6z17_vsc26lg0_2m0000gn/T/pants-sandbox-8wNaAv
gaston@ip-192-168-1-111 /p/v/f/p/f/T/pants-sandbox-8wNaAv> /Users/gaston/.rd/bin/docker buildx build --output=type=docker --pull=False --tag ghcr.io/autoprotect-ai/mock-server:latest --file mock-server/Dockerfile.docker .
[+] Building 2.3s (8/8) FINISHED                                                                                                                                                                                                         docker:desktop-linux
 => [internal] load build definition from Dockerfile.docker                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 292B                                                                                                                                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/python:3.11-alpine3.18                                                                                                                                                                                0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                                                                                                          0.0s
 => [1/3] FROM docker.io/library/python:3.11-alpine3.18@sha256:b0daa88cf9940c2878551807a8a31811c3ea99e244db1cb3a14176715b9df964                                                                                                                          2.0s
 => => resolve docker.io/library/python:3.11-alpine3.18@sha256:b0daa88cf9940c2878551807a8a31811c3ea99e244db1cb3a14176715b9df96

which to me means that whatever pants.engine.process is doing to render those args is interpretting the --output option weirdly and spliting it or doing something fancy, right here

"/Users/gaston/.rd/bin/docker", "buildx", "build", "--output=type=docker",

which might be caused by the double equal sign in that last item. That is as far as I could go into this today. Also solved it momentarily disabling buildx but that is not our goal here, right?

avilaton commented 6 months ago

There is more, I placed the --output option right after buildx and it failed exactly as we have seen it before, so Process is placing some of those options BEFORE the build COMMAND.

/Users/gaston/.rd/bin/docker buildx --output=type=docker build --output type=docker --pull=False --tag ghcr.io/autoprotect-ai/mock-server:latest --file mock-server/Dockerfile.docker .
ERROR: unknown flag: --output
kaos commented 6 months ago

There is more, I placed the --output option right after buildx and it failed exactly as we have seen it before, so Process is placing some of those options BEFORE the build COMMAND.

/Users/gaston/.rd/bin/docker buildx --output=type=docker build --output type=docker --pull=False --tag ghcr.io/autoprotect-ai/mock-server:latest --file mock-server/Dockerfile.docker .
ERROR: unknown flag: --output

I think this is a red herring.. there's no way I can see that would place any args between buildx and build in the command:

https://github.com/pantsbuild/pants/blob/af1b5c7cc6f8f539be353b5f950c9d4c36a49138/src/python/pants/backend/docker/util_rules/docker_binary.py#L75-L80

kaos commented 6 months ago

But entering that sandbox and running the command works:

@avilaton, Does it also work if you use the __run.sh script?

That script more closely resembles the actual command run by pants, including the hermetic env.

avilaton commented 6 months ago

nope, it fails the same way as it does from within pants. Here is the full content of the __run.sh script

#!/usr/bin/env bash
# This command line should execute the same process as pants did internally.
cd /private/var/folders/pq/fdztgvgx6z17_vsc26lg0_2m0000gn/T/pants-sandbox-8wNaAv
env -i __UPSTREAM_IMAGE_IDS= /Users/gaston/.rd/bin/docker buildx build $'--output=type=docker' $'--pull=False' --tag $'ghcr.io/redacted/mock-server:latest' --file mock-server/Dockerfile.docker .
cjim8889 commented 6 months ago

I actually found out the cause. If docker_environment is specified for a given project, it will override the environment variables specified in pants.toml even if there isn't any environment variables specified in the docker_environment def to override.

nijave commented 5 months ago

nope, it fails the same way as it does from within pants. Here is the full content of the __run.sh script

#!/usr/bin/env bash
# This command line should execute the same process as pants did internally.
cd /private/var/folders/pq/fdztgvgx6z17_vsc26lg0_2m0000gn/T/pants-sandbox-8wNaAv
env -i __UPSTREAM_IMAGE_IDS= /Users/gaston/.rd/bin/docker buildx build $'--output=type=docker' $'--pull=False' --tag $'ghcr.io/redacted/mock-server:latest' --file mock-server/Dockerfile.docker .

It looks like the presence of the -i flag on env is creating problems but I'm not sure what existing variable docker would need access to

nijave commented 5 months ago

On macOS, it seems Docker needs access to the HOME variable although I'm not sure why. Maybe it's looking for certain Docker config or plugins and it can't locate something needed for buildx without it.

nijave commented 5 months ago

On macOS, it seems Docker needs access to the HOME variable although I'm not sure why. Maybe it's looking for certain Docker config or plugins and it can't locate something needed for buildx without it.

Here's a workaround (although potentially need some insight from Docker/moby if/how HOME is used)

# pants.toml
[docker]
use_buildx = true
build_args = ["HOME"]
dvgica commented 2 months ago

I've also seen this issue on macOS, not only with --output but also --platform. The workaround above solves the issue for both.