Closed pjenvey closed 4 years ago
The cache keys are all different doesn't that break cache restore in the deploy step?
How are they are different? CIRCLE_TAG would be equivalent for the related build/deploy steps and restore_cache looks for a prefix (so epoch doesn't factor in)
I just refreshed my memory on how circle's cached worked. Could you write a comment above the cache key commenting on why we chose that schema for posterity?
Is this change still needed?
A project I support just hit this problem. The contents of the image we had autodeployed to stage did match what we expected.
Ordered by workflow, here is what seems to have happened
workflow 58 build 144 built an image for a22f2d9, which adds tag 0.0.5, and logged "Stored Cache to v1--1599842496" workflow 58 build 148 logged "Found a cache from build 144 at v1-", restored 4ea2177f, and ran tests against it workflow 58 build 149 logged "Found a cache from build 146 at v1-", restored b9bf2ad5 and pushed it to dockerhub as 0.0.5
workflow 59 build 146 built an image for commit 994c5aa, an unmerged PR bumping dependencies, and logged "Stored Cache to v1-renovate/pin-dependencies-1599842530" workflow 59 build 150 logged "Found a cache from build 146 at v1-renovate/pin-dependencies", restored b9bf2ad5, and ran tests against it
So workflow 58 (the tag on master branch) and 59 (the PR) ran concurrently, and the last step of 58 which pushes the image to dockerhub restored and pushed an image from 59 instead.
This happened because we say to
- save_cache:
key: v1-{{ .Branch }}-{{epoch}}
- restore_cache:
key: v1-{{.Branch}}
and for master the is no branch, so the key is just v1-
. CircleCI docs say that
A key is searched against existing keys as a prefix.
Note: When there are multiple matches, the most recent match will be used, even if there is a more precise match
Build 149 had the choice of restoring either the cache "v1--1599842496" from build 144 or the cache "v1-renovate/pin-dependencies-1599842530" from build 146. Consistent with the docs it chose the latter, even though its from a different workflow.
With this PR, the caches would have been named "v1--0.0.5-1599842496" and "v1-renovate/pin-dependencies--1599842530". We would have tried to restore "v1--0.0.5" so we would have selected the right one.
So I think this is okay. Still @pjenvey I wonder why you chose CIRCLE_TAG instead of CIRCLE_SHA1? Since a tag is mutable, I think there is still potential for cache confusion if someone pushes a tag, then immediately repushes the tag pointing to a new commit.
I think we'd have to add the identifier to line 69 as well.
I think bpitts is right, adding the SHA1 is probably the most correct way of caching data, even if it means the cache can only be used within the same commit.
I'm wondering if the below does what we want, or if there is some downside I'm not seeing that lead to Branch being picked originally.
diff --git a/.circleci/config.yml b/.circleci/config.yml
index fc0b225..a5da9c9 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -35,7 +35,7 @@ jobs:
name: docker save app:build
command: mkdir -p /cache; docker save -o /cache/docker.tar "app:build"
- save_cache:
- key: v1-{{ .Branch }}-{{epoch}}
+ key: v1-{{ .Environment.CIRCLE_SHA1 }}-{{epoch}}
paths:
- /cache/docker.tar
@@ -45,7 +45,7 @@ jobs:
steps:
- setup_remote_docker
- restore_cache:
- key: v1-{{.Branch}}
+ key: v1-{{.Environment.CIRCLE_SHA1}}
- run:
name: Restore Docker image cache
command: docker load -i /cache/docker.tar
@@ -60,7 +60,7 @@ jobs:
steps:
- setup_remote_docker
- restore_cache:
- key: v1-{{.Branch}}
+ key: v1-{{.Environment.CIRCLE_SHA1}}
- run:
name: Restore Docker image cache
command: docker load -i /cache/docker.tar
I'm testing that in https://github.com/mozilla-services/topsites-proxy/pull/49
Superseded by #57
Indeed CIRCLE_SHA1 is the way to go, I just ran into a non-tagged dev build utilizing CIRCLE_TAG that still hit this issue. Like bpitts said, it would also be an issue for a re-pushed tag.
to avoid deploy steps potentially picking up a previous merge build that wrote to the cache out of order
..which causes our deployment verifier to fail