Closed kwrobert closed 2 years ago
Looking at 2.11.1 release notes: https://pypi.org/project/pantsbuild.pants/2.11.1/
I see https://github.com/pantsbuild/pants/pull/15300
Can you try an upgrade to 2.11.1 and see if that fixes?
Ah, nmm! This has 0 python_distribution
targets involved.
Interesting new development, if I instead use the instructions
argument of the docker_image
target to provide my image building instructions to pants, everything works as expected! Also, the logs from the ./pants package
command look different, and notably do not contain any indication of commands being cancelled. Here are my new BUILD
files:
$ cat infra/docker/build_runner/BUILD
docker_image(
name="build_runner",
image_tags=["build_runner_pants"],
instructions=[
"FROM cimg/python:3.8",
"# hadolint ignore=DL3008,DL3004",
"RUN sudo apt-get update -y \\",
" && sudo apt-get install --no-install-recommends -y python3-dev python3-distutils \\",
" amazon-ecr-credential-helper \\",
" awscli \\",
" build-essential \\",
" libssl-dev \\",
" libffi-dev \\",
" && sudo apt-get clean \\",
" && sudo rm -rf /var/lib/apt/lists/*",
"# hadolint ignore=DL3013",
"RUN pip install --no-cache-dir credstash",
"RUN echo 'testing' >> test_file_parent.txt",
"RUN echo 'testing' >> test_file_parent2.txt",
"RUN echo 'testing' >> test_file_parent3.txt",
"RUN echo 'testing' >> test_file_parent4.txt",
],
source="",
)
$ cat infra/docker/build_runner_child/BUILD
docker_image(
name="build_runner_child",
image_tags=["build_runner_child"],
source="",
instructions=[
"ARG BASE=infra/docker/build_runner:build_runner",
"FROM $BASE",
"RUN echo 'testing' >> test_file.txt"
]
)
And the results of packaging the child:
$ ./pants package infra/docker/build_runner_child:build_runner_child
09:28:44.75 [INFO] Initializing scheduler...
09:28:45.00 [INFO] Scheduler initialized.
09:28:49.90 [INFO] Completed: Building docker image 137296740171.dkr.ecr.us-west-2.amazonaws.com/build-system-demo-pants:build_runner_pants
09:28:50.67 [INFO] Completed: Building docker image 137296740171.dkr.ecr.us-west-2.amazonaws.com/build-system-demo-pants:build_runner_child
09:28:50.67 [INFO] Built docker image: 137296740171.dkr.ecr.us-west-2.amazonaws.com/build-system-demo-pants:build_runner_child
Docker image ID: sha256:af4613f1ebb8933e097140f7e9e7d4d6e112da453d5a5fabaaa4b569ce1323c5
And the files appear as expected in the child image after a single invocation of the packaging command. Notice how there is no cancellation of build steps in the logs there (in contrast to when I have external Dockerfiles
defined outside the build file)?
This makes me think there must be some sort of race condition in the way pants parses Dockerfiles
? I know pants will cancel a build task (as it does when using external Dockerfiles
) if it detects the files consumed by a build target are modified on disk while it is executing a goal that builds that target. Maybe something like that is happening here, i.e pants is somehow indicating to itself that one of the Dockerfiles
was modified by parsing the file.
fyi @pantsbuild/maintainers I'll not be able to investigate this currently.
Related to https://github.com/pantsbuild/pants/issues/16356 ?
I slightly suspect it's a timing issue, due to the instructions aspect. The daemon might not be receiving the file mod notification before the command is issued?
Doesn't seem related to the other issue to me, because the second run works. Unless the second run happened to restart the daemon
Hi @kwrobert thanks for the bug report, and sorry it's taken a while to get to triaging it.
I reproduce this locally on HEAD of the main branch, so this is definitely a Pants bug of some kind. Will dig further.
Interesting distinction between runs that do update the parent image and those that don't:
When the parent image is NOT updated, the console logging shows:
statler:[~/src/pantsdockertest][main]$ ./pants_from_sources --pantsd package infra/docker/build_runner_child:build_runner_child
16:49:35.46 [INFO] Canceled: Building docker image build_runner:build_runner_pants
16:49:36.39 [INFO] Completed: Building docker image build_runner_child:build_runner_child
16:49:36.39 [INFO] Canceled: Building docker image build_runner:build_runner_pants
16:49:38.25 [INFO] Completed: Building docker image build_runner:build_runner_pants
16:49:38.25 [INFO] Built docker image: build_runner_child:build_runner_child
Then the subsequent run that does update the parent image has this:
statler:[~/src/pantsdockertest][main]$ ./pants_from_sources --pantsd package infra/docker/build_runner_child:build_runner_child
16:50:05.53 [INFO] Completed: Building docker image build_runner_child:build_runner_child
16:50:05.53 [INFO] Canceled: Building docker image build_runner:build_runner_pants
16:50:06.27 [INFO] Completed: Building docker image build_runner:build_runner_pants
16:50:06.27 [INFO] Built docker image: build_runner_child:build_runner_child
Note that the former has one extra Canceled: Building docker image build_runner:build_runner_pants
line. Not sure what this means.
Related, it's weird that we show "Completed" for processes that we "Canceled".
See here for a repo and README that fully reproduces this: https://github.com/benjyw/dockerbug
I think what's happening is that the two processes are speculatively re-run on their old inputs. Then the parent gets canceled and re-run because its inputs have changed, but the child isn't because no explicit input to it has changed: the BuiltPackage for the parent image doesn't change because we don't actually put anything in the output digest...
Although we do update the image id in the artifact names, so maybe that's not working properly somehow
No, that's not it. We do update the imageid correctly in the BuiltPackage.
OK, yep, the problem is that the actual input to the child image build process is a DockerBuildContext, and that does not reflect the changes to the parent image. The creation of DockerBuildContext regenerates the parent image as a byproduct, but its identity doesn't affect DockerBuildContext's hash...
Or more precisely, its tendrils into the child building Process.
Fixed by #16419
Thanks for the bug report and for your patience, @kwrobert!
Thanks for banging out this bug fix! What's the release plan for getting this into mainline? Would love to pull it down and deploy it into my CI systems.
Once it's merged, you can use PANTS_SHA
: https://www.pantsbuild.org/docs/manual-installation#running-pants-from-unreleased-builds. It will go out in the next 2.14 dev release, and will also be cherry-picked for the next 2.13 rc :)
Describe the bug
When rebuilding a child
docker_image
target that depends on an upstream parent image as a base, running./pants package path/to:child_target
does not appear to correctly rebuild the parent image before building the child image.Pants version
OS
Currently only on Mac OS X, haven't tested on Linux yet
Additional info
Here is the process I went through. First, here is the directory layout from the top of the repo:
And the contents of the
BUILD
files andDockerfiles
:If I tried to package the child, it correctly builds the parent first:
But if I make a modification to the
Dockerfile
of the parent by introducing a new layer after this first build via this modification:And then rebuild the child image, the file added in the parent image doesn't appear in the child image:
However if I run the
./pants package
command a second time, then the file does appear:I found this behavior highly unusual and unexpected. I'm not sure if this is an issue with my local setup, or a bug with pants, but I thought I would report it here to get some help investigating.