Stamp git commit id in the image

jcchavezs commented 4 years ago

As of now we are only relaying on main version of hypertrace-federated-service, however we don't know what image are we actually using because all of them are being called main. Having the ability to stamp the git commit id will clearly state what the image is and it will help on triage for issues like: "docker-compose up stopped working from yesterday to today but nothing changed".

After a research I could not find the way to do this with gradle. Any idea @aaron-steinfeld @adriancole @JBAhire @pavolloffay?

aaron-steinfeld commented 4 years ago

Agreed, this is a good idea. I can work out the specific mechanics of doing this, but how exactly do we want the SHA to present? As an env var like COMMIT so it can be echo'd on startup?

jcchavezs commented 4 years ago

I think a build arg is better as it will show up in docker inspect right and it is part of the image.

aaron-steinfeld commented 4 years ago

I can use a build arg to pass it in, but I guess I'm asking what to do with it next. I could see it as a label, env var, or both. Not sure if there's an expected standard here, a little googling seemed to lead to lots of people all doing it their own way.

ARG COMMIT="unknown"
LABEL commit=${COMMIT}
ENV COMMIT=${COMMIT}

jcchavezs commented 4 years ago

I see @aaron-steinfeld, I think I like the hypertrace-service.commit_sha=${COMMIT} and hypertrace-service.version=${VERSION}. It would be great to have the same for all the other services (like in https://github.com/hypertrace/hypertrace-service/blob/main/settings.gradle.kts#L15) running in this container. Any ideas @kotharironak @JBAhire ?

kotharironak commented 4 years ago

@jcchavezs This is a submodule include - (https://github.com/hypertrace/hypertrace-service/blob/main/settings.gradle.kts#L15) - commits show up in repo which include this submodule.

jcchavezs commented 4 years ago

Yeah so I think we need to include this repo version and all submodule versions.

aaron-steinfeld commented 4 years ago

Hmm - so to support submodules (in the generic sense), we'd ultimately have a Dockerfile generated like:

ARG COMMIT_SHA="unknown"
<for each submodule>
ARG <submodule>.COMMIT_SHA="unknown"
<end>

LABEL commit_sha=${COMMIT_SHA}
ENV COMMIT_SHA=${COMMIT_SHA}
<for each submodule>
LABEL <submodule>.commit_sha=${<submodule>.COMMIT_SHA}
ENV <submodule>.COMMIT_SHA=${<submodule>.COMMIT_SHA}
<end>

My votes would be:

Hold off on the version for now, as they may not exist if we're building off latest main, and our versions don't really have meaning yet
Do just the top level commit as a first cut. We don't need git access to do that since the SHA will be in the env, while the submodule info will involve working with git. If this is purely for debugging, the other SHAs (and any versions) are directly discoverable from that with a one liner: git checkout <SHA> && git submodule status. If we then find that it makes things clearer to add the submodule info in directly, we can go down that path.

jcchavezs commented 4 years ago

@aaron-steinfeld sorry for the confusion I did not mean to include version but just to show how I would stablish the naming convention. If we can have the sha for https://github.com/hypertrace/hypertrace-service/blob/main/hypertrace-service/Dockerfile that is already a win.

kotharironak commented 4 years ago

I think we can go with top-level commit sha (not for submodules) as of now as we have another open question on macro rep vs submodules.

Will be modified image tag too? If so, considering, we already have the main branch tag. The final tag will be as below? docker pull hypertrace/hypertrace:main-<commit-sha>

jcchavezs commented 4 years ago

I am not 100% sure about adding the hash in the tag, at least not in the main registry. We don't want to encourage people using non release versions for now. @pavolloffay proposed to have a second image registry (e.g. hypertrace/hypertrace-service-dev) where we could push all master merges. That one makes sense.

In the other hand, when it comes to hypertrace/hypertrace-service-dev I would propose to use build number (provided by CI) instead of hash as it is explicit what is older/newer than what.

What do you think @kotharironak ?

kotharironak commented 4 years ago

@jcchavezs sounds good to me.

pavolloffay commented 4 years ago

Do just the top level commit as a first cut.

With git submodules this should be enough to find out the version/SHA of the submodule.

I would propose to use build number (provided by CI) instead of hash as it is explicit what is older/newer than what.

We should keep in mind the retention of the CI jobs/logs. Once the job history is lost we won't be able to correlate it with the git history, but I like the predictability in the numbering.

jcchavezs commented 4 years ago

@pavolloffay yeah, that concern can be addressed by doing hypertrace-service-dev-{ci_version}-{commit_sha}

pavolloffay commented 4 years ago

After expired CI history it will be hard to get a tag for a give SHA. There could be two tags :name-{ci_version} and :name-{commit_sha}.

aaron-steinfeld commented 4 years ago

I'm hesitant to introduce a third parallel concept of versioning (beyond the release version and the SHA) - especially because I don't believe build numbers have any guarantee of monotonicity. They'd be based on execution order, so a rerun of an older SHA would be considered newer than a run of a newer SHA.

There's a few concrete things here though we can get started on while we figure out if that's needed and what form it would take:

SHA in image
Move non-release builds to their own repository
Stamp SHAs into non release builds, pushing a <branch>-<SHA> tag in addition to the existing <branch> tag

jcchavezs commented 4 years ago

From my POV it is important to have 1 as soon as possible. Number 2 can be implemented once we actually need it.

Stamp SHAs into non release builds, pushing a - tag in addition to the existing tag

Not sure about this. Is there any interest on the branch once we have the sha?

aaron-steinfeld commented 4 years ago

From my POV it is important to have 1 as soon as possible.

You got it - will find some time to work on this today or tomorrow.

Number 2 can be implemented once we actually need it.

Agreed

Not sure about this. Is there any interest on the branch once we have the sha?

No strong feelings of <brach>-<SHA> vs just <SHA>, as long as we also push <branch> (which is our current behavior).

codefromthecrypt commented 4 years ago

I'm not a fan of this as there are many ways to address the problem of which git sha is in use by an image. I recommend fixing logging instead and possibly adding an /info endpoint which has the git sha in it.

I assume the only purpose of this is to link a docker image digest with a git sha. For example, we have the user tell us which digest their main image is using, and lookup the sha-based tag with the same docker digest.

Screen Shot 2020-09-21 at 9 04 52 AM

To get the above is tricky to ask a user to do, as they would need to run a docker command to figure out the digest. I assume we are not encouraging people to pin to git sha, as that would intentionally pin people to code that may have already disappeared by the time they ask why.

Adding more specific tags also re-clutters our dockerhub tags, with another form of non-release images. sha-based docker tags will pin in some ways digests only referred to by "main" now. For example, when you replace "main" it orphans the digest. Whether a registry cleans up eagerly or not is an implementation detail. Docker hub mentions it will clean up things not updated in 6 months. While I don't know the implementations of registries, it is likely that tagging every digest with a SHA will lead to higher costs even if they aren't born to us directly.

Moreover, we are still restructuring the project. A good way to notice this is the image discussed (hypertrace-federated-service) has already been renamed! Adding another way for people to pin is another support dimension, one that will interfere more directly when we actually make releases, and certainly not reduce confusion. This topic only came up about knowing a git hash.. it doesn't need to be solved with a new layer of release process.

An alternative is to use logging instead. For example, in zipkin we use the git hash plugin to update a resources file. When the server starts up, regardless of it being via java or docker, the hash is in the console output. It is also in the /info admin endpoint.

ex.

$ java -jar zipkin.jar

                  oo
                 oooo
                oooooo
               oooooooo
              oooooooooo
             oooooooooooo
           ooooooo  ooooooo
          oooooo     ooooooo
         oooooo       ooooooo
        oooooo   o  o   oooooo
       oooooo   oo  oo   oooooo
     ooooooo  oooo  oooo  ooooooo
    oooooo   ooooo  ooooo  ooooooo
   oooooo   oooooo  oooooo  ooooooo
  oooooooo      oo  oo      oooooooo
  ooooooooooooo oo  oo ooooooooooooo
      oooooooooooo  oooooooooooo
          oooooooo  oooooooo
              oooo  oooo

     ________ ____  _  _____ _   _
    |__  /_ _|  _ \| |/ /_ _| \ | |
      / / | || |_) | ' / | ||  \| |
     / /_ | ||  __/| . \ | || |\  |
    |____|___|_|   |_|\_\___|_| \_|

:: version 2.21.7 :: commit 7c8e569 ::

looking at console output is a lot easier than having someone correlate the docker image hash of main with a git hash tag that matches it. It is easier for them, easier for us, and doesn't add a new process that complicates what a release is.

TL;DR; I would recommend undoing this, but understand those managing the project may have other optimizations in mind, so up to you!

aaron-steinfeld commented 3 years ago

I can certainly buy the argument against SHAs in the tags, and I think where we settled was kind of a two phase approach:

Add the SHA information at container build time as a label and env variable. This is all that is enabled by the change linked above - no tagging, no change in release process. This enables things like logging or an INFO endpoint, which I understood to be the main goal of this issue. We could do this as something like a resource file too, but this way its language agnostic - would be interested if there's a compelling reason that it should be at the code level instead.
Add the SHA to the tag. As I understand the comment, this is what's being disagreed with. This part was deferred to see if it was really needed. If we do do this, they'd go to a separate repository - but the arguments both about cost and not supporting user pinning have merit.

jcchavezs commented 3 years ago

Thanks for the thoughts @adriancole. First let me try to clarify the use case: I want to know what commit SHA generated an image I use as main or master. As you pointed out, the digest is useless but instead, having the SHA as docker build argument I can retrieve it directly from the container by simply transforming the docker inspect output. This is something fairly common in artifacts (binaries or docker images):

Add SHA to the tag: This is probably something less interesting, I used to work with CI job number versions to debug issues but that is not our case, that is why I proposed to do this when we actually need it.

codefromthecrypt commented 3 years ago

@jcchavezs also having the sha as a docker env makes sense when you are in a position to run docker inspect but not able to view the console. These aren't mutually exclusive, rather complimentary. Frequently, people are in a position to read console but not knowing how or too late to run docker inspect. OTOH there could be cases of the visa versa and there's little cost in adding that sha env..

kotharironak commented 3 years ago

shall we close this?

hypertrace / hypertrace-service

Stamp git commit id in the image #30