Closed XanderStrike closed 5 years ago
Actually on further investigation it seems like the digest isn't even actually the digest for the image. This is pretty confusing since it looks like an image digest 😐
$ deplab -g /tmp/istio -i gcr.io/cf-routing/pilot:1.3.0 -t gcr.io/cf-routing/pilot:1.3.0
sha256:fd2f51805d46eaa37c1544e3a9cdaccffb245f141874763adc297562d1c4f964
$ docker push gcr.io/cf-routing/pilot:1.3.0
The push refers to repository [gcr.io/cf-routing/pilot]
aa1df9b87be4: Layer already exists
a1d0ac46357e: Layer already exists
f1842ad5b70f: Layer already exists
e79142719515: Layer already exists
aeda103e78c9: Layer already exists
2558e637fbff: Layer already exists
f749b9b0fb21: Layer already exists
1.3.0: digest: sha256:334dbe446972a71bea0fd9a3c9ad2e89e52e613d660d356ab703e3405e5a478b size: 1786
The value output by deplab is actually the Image ID for the newly created image as opposed to the repo digest. For example, we have Istio images in GCR without deplab metadata:
$ docker pull gcr.io/cf-navcon/istio/pilot:latest
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/cf-navcon/istio/pilot latest 85af27504b36 4 months ago 283MB
$ docker inspect gcr.io/cf-navcon/istio/pilot:latest | jq '.[0] | .Id, .RepoDigests'
"sha256:85af27504b36a4117d96e74a294b62d100e92fef9a3830b8c82a4137d92cbab2"
[
"gcr.io/cf-navcon/istio/pilot@sha256:428959abad1e61f4adbfb3b35ec61d85c952c7c46696e15adc7b3863eb9719f5"
]
Then, running deplab against this image we get a new image without a repo digest:
$ deplab -g ~/workspace/stacks/ -i gcr.io/cf-navcon/istio/pilot:latest -t gcr.io/cf-navcon/istio/pilot:deplab
sha256:19cb06772618d7761e03c25657bb5bfc83dfdd0feecf1d0a7f372bac8e4b6851
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/cf-navcon/istio/pilot deplab 19cb06772618 5 minutes ago 283MB
gcr.io/cf-navcon/istio/pilot latest 85af27504b36 4 months ago 283MB
$ docker inspect gcr.io/cf-navcon/istio/pilot:deplab | jq '.[0] | .Id, .RepoDigests'
"sha256:19cb06772618d7761e03c25657bb5bfc83dfdd0feecf1d0a7f372bac8e4b6851"
[]
When pushing this back to GCR, the ID does not change but we get a new RepoDigest:
$ docker push gcr.io/cf-navcon/istio/pilot:deplab
The push refers to repository [gcr.io/cf-navcon/istio/pilot]
f1a49cc17b70: Layer already exists
7892b678008a: Layer already exists
4542e6d725c6: Layer already exists
c668b1d642cb: Layer already exists
d908c11b1a82: Layer already exists
7ccfaa7554e3: Layer already exists
89ec57aea3bf: Layer already exists
a0c1e01578b7: Layer already exists
deplab: digest: sha256:29b33cd01e6e36bd461a316b7ca1c5c7da9de248018778e23363c2208b292664 size: 1995
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/cf-navcon/istio/pilot deplab 19cb06772618 5 minutes ago 283MB
gcr.io/cf-navcon/istio/pilot latest 85af27504b36 4 months ago 283MB
$ docker inspect gcr.io/cf-navcon/istio/pilot:deplab | jq '.[0] | .Id, .RepoDigests'"sha256:19cb06772618d7761e03c25657bb5bfc83dfdd0feecf1d0a7f372bac8e4b6851"
[
"gcr.io/cf-navcon/istio/pilot@sha256:29b33cd01e6e36bd461a316b7ca1c5c7da9de248018778e23363c2208b292664"
]
Would it reduce potential confusion if the output from deplab clearly stated that this was the image ID?
In terms of making deplab idempotent, we have put some thought into this but it is not entirely straightforward. We could simply inspect the image metadata when deplab was called, check for the presence of the io.pivotal.metadata
label and exit if it already exists. However, suppose someone has already labelled an image using deplab, then used that image as the base for another image which adds a bunch of additional packages. If deplab were to exit upon finding that the io.pivotal.metadata
label already exists, the metadata would be missing those additional packages. With our current implementation on the other hand, deplab will overwrite the io.pivotal.metadata
label in this case with an accurate description of this new image and all of its packages.
An alternative approach would be for deplab to always extract the full metadata from an image, then compare this to any metadata labels already present on that image and only re-label the image if a discrepancy has been found. This would ensure that the ID of the image would only change if deplab needed to create a new layer with fresh metadata.
I'm curious about the use-case though. So far, we have been assuming that deplab would only be run for an image once. This would happen in some automated process essentially immediately after the image has been built and before that new image get's pushed to a registry. In that case, we imagined that if deplab was run on an image which already had the io.pivotal.metadata
label, it would only be because something had changed and the image needed to be re-labelled. If this isn't true for Istio, I'd like to learn more about what the flow would actually be like. (I'm happy to chat over Zoom if it's easier.)
Fabulous! I didn't realize it was a different SHA, my Docker knowledge shortcomings are showing!
The use case (if you can call it that) is I'm running deplab once right after building the image to label and tag it, and once again later to get the dpkg
list, and I was surprised to see the SHAs change. I was mistakenly using the SHA returned by deplab to try to uniquely identify the image (i.e. pulling out an image by digest), but that needs to be the repo SHA. Switching to this solved my issue.
I still think idempotency is generally valuable on tools like this but I don't have a valid use case for it at the moment, so we can consider this issue resolved. Thanks!
Sounds good, thanks!
FWIW, we currently recommend using deplab once to label the image and generate the dpkg list at the same time. We could consider a feature in future which would allow you to generate the dpkg list while explicitly skipping the image labelling step. However, we're actually planning to remove the need for the dpkg list file from the OSL process soon and rely solely on the metadata on the image.
I agree in general that making deplab idempotent would be nice so I've added a story for this to our icebox. It may not be a high priority for us right now though. I've also added a story to make deplab's output easier to understand.
When running deplab as part of a script, it would be nice to be able to re-run the script without having deplab change the digest.
I can't actually tell what deplab is changing about the image each time you run it, but it returns a different digest every time. Ideally deplab should be able to detect if it's already added metadata to a container, and complete instantly.