safe-waters / docker-lock

Automatically manage image digests in Dockerfiles, docker-compose files, and Kubernetes manifests by tracking them in a separate Lockfile
Apache License 2.0
429 stars 18 forks source link

Rewrite without lockfiles #65

Open kokes opened 3 years ago

kokes commented 3 years ago

The way we'd like to use a tool like docker lock is that we'd leave the digests in our Dockerfiles and upon each rewrite, we'd lookup the current digests and replace the outdated ones. We don't have a particularly good use of the lock file.

So we'd essentially do docker lock generate && docker lock rewrite && rm docker-lock.json - are we alone in this? If not, wouldn't something like docker lock rewrite --no-lock-file or something along those lines be of use? It'd do the generate step (in memory) and rewrite the Dockerfile without saving these into a lock file.

This is just a tentative proposal, maybe we're misunderstanding the intended workflow.

michaelperel commented 3 years ago

The way that I use the tool is: (1) Do not specify digests in Dockerfiles, instead run docker lock generate to record them in a Lockfile. (2) Track the Lockfile in git (3) When time to deploy, first run docker lock rewrite to rewrite all the Dockerfiles with the digests from the Lockfile.

Currently, if you specify your digest in your Dockerfile, the tool assumes that you want to keep that digest hardcoded, and it does not look up a new digest from the registry when running docker lock generate. Instead, it records the digest from the Dockerfile into the Lockfile.

If I understand correctly, you are asking basically for: (1) The ability for docker lock to get new digests, even if they are hardcoded in the Dockerfiles (2) The ability to generate & rewrite the files in one shot, without the Lockfile.

These sound like good features to me. Just curious, any particular reason why you would prefer to leave the digests in the Dockerfiles rather than use a Lockfile?

kokes commented 3 years ago

You're understanding it exactly right.

We don't have a particularly strong preference for this, we just don't see enough value in keeping this data in a separate file. Plus we want to make sure our Dockerfiles are always ready to build - keeping the digests separately might cause a drift if we're not careful. While we do use digests for other runtimes (Python, most often), the difference there is that we can install from lockfiles directly, but since docker-lock is a third-party plugin, all installation (builds) is still happening off of a Dockerfile, hence our preference to keep the digests there.

michaelperel commented 3 years ago

The point about drift is a good one, and I can see why someone would prefer to not use a separate file. I imagine the features would look something like:

(1) docker lock generate --use-existing-digests where files with digests would also be updated if this were false. (2) docker lock rewrite --no-lockfile where you could pass the existing flags from the generate command to rewrite.

In terms of priority, I am in the process of refactoring so that the code is more generic/reusable (currently parts of the code base are too heavily tied to the type of file i.e. Dockerfile, Composefile, Kubernetesfile) as there are a variety of other types of files I want to support such as github actions yml.

I would have bandwidth to work on these 2 features after this refactoring is done, which I imagine would start in 2-3 weeks.

Otherwise, if you have interest in trying your hand at either of these features or have any suggestions, please feel welcome and I will backport them once I have finished refactoring.

Thanks!

kokes commented 3 years ago

Cool cool cool. I'm in no rush, we're still evaluating what our options are and we can replicate our desired workflow even with lock files, so it's not a blocker. Also, I think this feature is not so much about writing code but about the UX of the workflow, so I'd rather take the time and test out various options before settling on something.

What's more pressing is ECR support - been meaning to try out the AWS Go SDK, so might give that a shot in the coming weeks, if I have the time. Oh, it's not about the SDK, it's just the registry API, so it might be a lot easier then.

michaelperel commented 3 years ago

Sounds great, thanks for the background. ECR support is definitely easy to add -- I personally use ACR and left that as an open task since I do not have an AWS account (and didn't look into seeing if free ones exist, so tests could run in the CI).

Since all the registries implement the same API, it would be easy to add ECR, and I am happy to do so next week. If there are free accounts, I will make sure it runs in the CI as well.

michaelperel commented 3 years ago

@kokes PR #71 and release v0.7.3 allows you to now say:

docker lock generate --update-existing-digests

which will record the new digests in the Lockfile, even if they are hardcoded in your Dockerfiles.

So for your usecase, where you don't care about the lockfile, you can say:

docker lock generate --update-existing-digests
docker lock rewrite
rm docker-lock.json

And your digests will always be tracked in the Dockerfiles.

As to being able to do this in one line, I think it is a good idea, but I am going to focus on a few other issues first.

kokes commented 3 years ago

@michaelperel Cool, this works as expected. I only ran into one weird edge case.

Not sure if that's a good practice, but some of my Docker FROM clauses look like this (no tag, digest included; it builds):

FROM python@sha256:b013f727bd37ddf8f9267dfeda6e2535c7df89961259dd465a1d4d274c96de94

And these are completely ignored by docker-lock, or at least by --update-existing-digests. I found out about this when I forgot to login to ECR, deleted some characters from my digest to force an update... and nothing happened, not even a complaint I wasn't logged in.

Note that if I delete the digest altogether, docker-lock will add the tag latest and its corresponding digest.

michaelperel commented 3 years ago

@kokes Currently docker-lock does not try to update images that do not have a tag, but I would be open to a suggestion on how to do it.

This is because multiple tags can have the same digest. For instance, currently ubuntu:focal and ubuntu:20.04 have the same digest. If an update occurs to ubuntu:20.04 but not ubuntu:focal, would docker-lock get the updated digest from ubuntu:20.04 or keep the same one from ubuntu:focal?

I figured keeping the digest the same as written in Dockerfile made the most sense.

Perhaps in these scenarios, a warning log message should be printed?

To the last point, docker-lock adds latest if you delete the digest because normally when you pull from docker with just a name and no digest, docker automatically adds latest, so docker-lock follows this behavior.

michaelperel commented 3 years ago

Additionally, if you pull an image by the digest for an image you do not have locally, such as: docker pull fedora@sha256:aa889c59fc048b597dcfab40898ee3fcaad9ed61caf12bcfef44493ee670e9df

and then run docker images

you will see that the tag is none

Screen Shot 2020-12-08 at 11 07 26 AM

So I am not sure if it would make sense to update an image with a tag of none.

kokes commented 3 years ago

@michaelperel makes sense. I guess a log message about a given image being ignored due to the lack of tags would be useful. This way it just silently ignores it, which I found a bit too... silent.

michaelperel commented 3 years ago

@kokes the log statement is in the most recent release, v0.8.3