Open jcchavezs opened 4 years ago
Agreed, this is a good idea. I can work out the specific mechanics of doing this, but how exactly do we want the SHA to present? As an env var like COMMIT
so it can be echo'd on startup?
I think a build arg is better as it will show up in docker inspect right and it is part of the image.
I can use a build arg to pass it in, but I guess I'm asking what to do with it next. I could see it as a label, env var, or both. Not sure if there's an expected standard here, a little googling seemed to lead to lots of people all doing it their own way.
ARG COMMIT="unknown"
LABEL commit=${COMMIT}
ENV COMMIT=${COMMIT}
I see @aaron-steinfeld, I think I like the hypertrace-service.commit_sha=${COMMIT}
and hypertrace-service.version=${VERSION}
. It would be great to have the same for all the other services (like in https://github.com/hypertrace/hypertrace-service/blob/main/settings.gradle.kts#L15) running in this container. Any ideas @kotharironak @JBAhire ?
@jcchavezs This is a submodule include - (https://github.com/hypertrace/hypertrace-service/blob/main/settings.gradle.kts#L15) - commits show up in repo which include this submodule.
Yeah so I think we need to include this repo version and all submodule versions.
Hmm - so to support submodules (in the generic sense), we'd ultimately have a Dockerfile generated like:
ARG COMMIT_SHA="unknown"
<for each submodule>
ARG <submodule>.COMMIT_SHA="unknown"
<end>
LABEL commit_sha=${COMMIT_SHA}
ENV COMMIT_SHA=${COMMIT_SHA}
<for each submodule>
LABEL <submodule>.commit_sha=${<submodule>.COMMIT_SHA}
ENV <submodule>.COMMIT_SHA=${<submodule>.COMMIT_SHA}
<end>
My votes would be:
git checkout <SHA> && git submodule status
. If we then find that it makes things clearer to add the submodule info in directly, we can go down that path.@aaron-steinfeld sorry for the confusion I did not mean to include version but just to show how I would stablish the naming convention. If we can have the sha for https://github.com/hypertrace/hypertrace-service/blob/main/hypertrace-service/Dockerfile that is already a win.
I think we can go with top-level commit sha (not for submodules) as of now as we have another open question on macro rep vs submodules.
Will be modified image tag too? If so, considering, we already have the main
branch tag. The final tag will be as below?
docker pull hypertrace/hypertrace:main-<commit-sha>
I am not 100% sure about adding the hash in the tag, at least not in the main registry. We don't want to encourage people using non release versions for now. @pavolloffay proposed to have a second image registry (e.g. hypertrace/hypertrace-service-dev) where we could push all master merges. That one makes sense.
In the other hand, when it comes to hypertrace/hypertrace-service-dev
I would propose to use build number (provided by CI) instead of hash as it is explicit what is older/newer than what.
What do you think @kotharironak ?
@jcchavezs sounds good to me.
Do just the top level commit as a first cut.
With git submodules this should be enough to find out the version/SHA of the submodule.
I would propose to use build number (provided by CI) instead of hash as it is explicit what is older/newer than what.
We should keep in mind the retention of the CI jobs/logs. Once the job history is lost we won't be able to correlate it with the git history, but I like the predictability in the numbering.
@pavolloffay yeah, that concern can be addressed by doing hypertrace-service-dev-{ci_version}-{commit_sha}
After expired CI history it will be hard to get a tag for a give SHA. There could be two tags :name-{ci_version}
and :name-{commit_sha}
.
I'm hesitant to introduce a third parallel concept of versioning (beyond the release version and the SHA) - especially because I don't believe build numbers have any guarantee of monotonicity. They'd be based on execution order, so a rerun of an older SHA would be considered newer than a run of a newer SHA.
There's a few concrete things here though we can get started on while we figure out if that's needed and what form it would take:
<branch>-<SHA>
tag in addition to the existing <branch>
tagFrom my POV it is important to have 1 as soon as possible. Number 2 can be implemented once we actually need it.
Stamp SHAs into non release builds, pushing a
- tag in addition to the existing tag
Not sure about this. Is there any interest on the branch once we have the sha?
From my POV it is important to have 1 as soon as possible.
You got it - will find some time to work on this today or tomorrow.
Number 2 can be implemented once we actually need it.
Agreed
Not sure about this. Is there any interest on the branch once we have the sha?
No strong feelings of <brach>-<SHA>
vs just <SHA>
, as long as we also push <branch>
(which is our current behavior).
I'm not a fan of this as there are many ways to address the problem of which git sha is in use by an image. I recommend fixing logging instead and possibly adding an /info endpoint which has the git sha in it.
I assume the only purpose of this is to link a docker image digest with a git sha. For example, we have the user tell us which digest their main image is using, and lookup the sha-based tag with the same docker digest.
To get the above is tricky to ask a user to do, as they would need to run a docker command to figure out the digest. I assume we are not encouraging people to pin to git sha, as that would intentionally pin people to code that may have already disappeared by the time they ask why.
Adding more specific tags also re-clutters our dockerhub tags, with another form of non-release images. sha-based docker tags will pin in some ways digests only referred to by "main" now. For example, when you replace "main" it orphans the digest. Whether a registry cleans up eagerly or not is an implementation detail. Docker hub mentions it will clean up things not updated in 6 months. While I don't know the implementations of registries, it is likely that tagging every digest with a SHA will lead to higher costs even if they aren't born to us directly.
Moreover, we are still restructuring the project. A good way to notice this is the image discussed (hypertrace-federated-service) has already been renamed! Adding another way for people to pin is another support dimension, one that will interfere more directly when we actually make releases, and certainly not reduce confusion. This topic only came up about knowing a git hash.. it doesn't need to be solved with a new layer of release process.
An alternative is to use logging instead. For example, in zipkin we use the git hash plugin to update a resources file. When the server starts up, regardless of it being via java or docker, the hash is in the console output. It is also in the /info admin endpoint.
ex.
$ java -jar zipkin.jar
oo
oooo
oooooo
oooooooo
oooooooooo
oooooooooooo
ooooooo ooooooo
oooooo ooooooo
oooooo ooooooo
oooooo o o oooooo
oooooo oo oo oooooo
ooooooo oooo oooo ooooooo
oooooo ooooo ooooo ooooooo
oooooo oooooo oooooo ooooooo
oooooooo oo oo oooooooo
ooooooooooooo oo oo ooooooooooooo
oooooooooooo oooooooooooo
oooooooo oooooooo
oooo oooo
________ ____ _ _____ _ _
|__ /_ _| _ \| |/ /_ _| \ | |
/ / | || |_) | ' / | || \| |
/ /_ | || __/| . \ | || |\ |
|____|___|_| |_|\_\___|_| \_|
:: version 2.21.7 :: commit 7c8e569 ::
looking at console output is a lot easier than having someone correlate the docker image hash of main
with a git hash tag that matches it. It is easier for them, easier for us, and doesn't add a new process that complicates what a release is.
TL;DR; I would recommend undoing this, but understand those managing the project may have other optimizations in mind, so up to you!
I can certainly buy the argument against SHAs in the tags, and I think where we settled was kind of a two phase approach:
Add the SHA information at container build time as a label and env variable. This is all that is enabled by the change linked above - no tagging, no change in release process. This enables things like logging or an INFO endpoint, which I understood to be the main goal of this issue. We could do this as something like a resource file too, but this way its language agnostic - would be interested if there's a compelling reason that it should be at the code level instead.
Add the SHA to the tag. As I understand the comment, this is what's being disagreed with. This part was deferred to see if it was really needed. If we do do this, they'd go to a separate repository - but the arguments both about cost and not supporting user pinning have merit.
Thanks for the thoughts @adriancole. First let me try to clarify the use case: I want to know what commit SHA generated an image I use as main
or master
. As you pointed out, the digest is useless but instead, having the SHA as docker build argument I can retrieve it directly from the container by simply transforming the docker inspect
output. This is something fairly common in artifacts (binaries or docker images):
@jcchavezs also having the sha as a docker env makes sense when you are in a position to run docker inspect but not able to view the console. These aren't mutually exclusive, rather complimentary. Frequently, people are in a position to read console but not knowing how or too late to run docker inspect. OTOH there could be cases of the visa versa and there's little cost in adding that sha env..
shall we close this?
As of now we are only relaying on
main
version of hypertrace-federated-service, however we don't know what image are we actually using because all of them are being calledmain
. Having the ability to stamp the git commit id will clearly state what the image is and it will help on triage for issues like: "docker-compose up stopped working from yesterday to today but nothing changed".After a research I could not find the way to do this with gradle. Any idea @aaron-steinfeld @adriancole @JBAhire @pavolloffay?