pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.19k stars 615 forks source link

Docker image is pulled even though test is cached #17958

Open chris-stetter opened 1 year ago

chris-stetter commented 1 year ago

Describe the bug When using the new environments feature where a test is executed within a Docker container and remote caching is used, the Docker image still gets pulled even though the test is cached subsequently:

Run ./pants test services::

[1](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:1)0:54:32.85 [WARN] DEPRECATED: The autoflake plugin has moved to `pants.backend.python.lint.autoflake` (and from the `fmt` goal to the `fix` goal).
10:54:32.92 [WARN] DEPRECATED: The autoflake plugin has moved to `pants.backend.python.lint.autoflake` (and from the `fmt` goal to the `fix` goal).
10:54:33.01 [INFO] Starting: Pulling Docker image `qgis/qgis:final-3_22_9` because the image is missing locally.
10:55:33.07 [INFO] Long running tasks:
  60.06s    Pulling Docker image `qgis/qgis:final-3_22_9` because the image is missing locally.
10:56:03.10 [INFO] Long running tasks:
  90.09s    Pulling Docker image `qgis/qgis:final-3_22_9` because the image is missing locally.
10:56:33.15 [INFO] Long running tasks:
  1[20](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:20).14s  Pulling Docker image `qgis/qgis:final-3_[22](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:23)_9` because the image is missing locally.
10:56:59.81 [INFO] Completed: Pulling Docker image `qgis/qgis:final-3_22_9` because the image is missing locally.
10:57:01.87 [INFO] Starting: Building requirements.pex
10:57:01.87 [INFO] Starting: Building 2 requirements for requirements.pex from the lockfiles/default.lock resolve: boto3==1.17.106, sqlalchemy==1.4.37
10:57:02.17 [INFO] Canceled: Building requirements.pex
10:57:02.17 [INFO] Starting: Building pytest.pex from lockfiles/pytest.lock
10:57:02.43 [INFO] Canceled: Building pytest.pex from lockfiles/pytest.lock
10:57:02.48 [INFO] Starting: Building local_dists.pex
10:57:02.52 [INFO] Canceled: Building 2 requirements for requirements.pex from the lockfiles/default.lock resolve: boto3==1.17.106, sqlalchemy==1.4.37
10:57:02.61 [INFO] Canceled: Building local_dists.pex
10:57:02.63 [INFO] Starting: Building pytest_runner.pex
10:57:02.63 [INFO] Starting: Building pytest_runner.pex
10:57:02.82 [INFO] Canceled: Building pytest_runner.pex
10:57:02.86 [INFO] Canceled: Building pytest_runner.pex
10:57:03.10 [INFO] Completed: Run Pytest - (environment:linux_docker, services/qgis/tests/example_test.py) - succeeded.
10:57:03.11 [INFO] Completed: Run Pytest - (environment:linux_docker, services/qgis/tests/test_add_spatial_analysis_project.py) - succeeded.

✓ services/qgis/tests/example_test.py succeeded in 1.81s (cached remotely).
✓ services/qgis/tests/test_add_spatial_analysis_project.py succeeded in 4.45s (cached remotely).
10:57:03.14 [INFO] Starting: Building coverage_py.pex from coverage-py_default.lock
10:57:03.42 [INFO] Canceled: Building coverage_py.pex from coverage-py_default.lock

Wrote xml coverage report to `dist/coverage/python`
10:57:03.83 [INFO] Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 18
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 27
  local_cache_requests_cached: 0
  local_cache_requests_uncached: 27
  local_cache_total_time_saved_ms: 0
  local_cache_write_errors: 0
  local_execution_requests: 10
  local_process_total_time_run_ms: 1335
  remote_cache_read_errors: 0
  remote_cache_requests: 26
  remote_cache_requests_cached: [23](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:24)
  remote_cache_requests_uncached: 0
  remote_cache_speculation_local_completed_first: 3
  remote_cache_speculation_remote_completed_first: 23
  remote_cache_total_time_saved_ms: 4[37](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:38)[84](https://github.com/myorg/myrepo/actions/runs/3861658230/jobs/6622990398#step:5:85)
  remote_cache_write_attempts: 3
  remote_cache_write_errors: 0
  remote_cache_write_successes: 3
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 0
  remote_store_missing_digest: 0

Pants version 2.15

OS GitHub Actions, so Linux

Additional info As pointed out by @stuhood in https://github.com/pantsbuild/pants/issues/15199#issuecomment-1376564332, the Docker image pull should not be triggered or canceled.

stuhood commented 1 year ago

It looks like the reason that this will happen is that our first step when using a docker image is to get an expanded image ID, including its sha256: https://github.com/pantsbuild/pants/blob/4a0af5e875f2fa53dea63415b50dbcb33be96a50/src/python/pants/core/util_rules/environments.py#L848-L877

...and currently the DockerResolveImageRequest intrinsic is implemented by first pulling the image, and then docker inspecting it: https://github.com/pantsbuild/pants/blob/4a0af5e875f2fa53dea63415b50dbcb33be96a50/src/rust/engine/src/intrinsics.rs#L682-L736

Some googling shows that it should be possible to get an image ID without first pulling... but AFAICT it looks like that would require direct access to the docker registry, rather than being provided by the CLI or the bollard crate.


In the meantime: as shown in the first section, if you include the sha256 tag of your image in the image name in your docker_environment target, we won't pull the image to append it.

chris-stetter commented 6 months ago

Due to https://github.com/pantsbuild/pants/issues/17714, I tried building an image with pants first and then use its sha to use that image in a docker_environment. As it always tries to pull first, this will not work.

Are there any plans to stabilize the environments feature?

sureshjoshi commented 1 month ago

I have a tentative fix for #17714, and that seems like a good step in solving this issue (or at least mitigating it). It doesn't solve the problem directly, but it pushes the problem one step back - so that better caching of Pants-created docker_images/meta will directly speed up the workflow and everything is kept in the ecosystem.

At the moment, my fix still asks for a BuiltPackage, and since that is run PER_SESSION, there is still a perf hit.

https://github.com/pantsbuild/pants/blob/53d6ad0b3c18a901b0eb1cdb6ec00ba371faed44/src/python/pants/backend/docker/util_rules/docker_binary.py#L102-L104

The part I'm stuck on must be cacheable though, I just haven't gone through all the available APIs yet.