hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

support for identifying out of date docker images #13061

Open josh-m-sharpe opened 2 years ago

josh-m-sharpe commented 2 years ago

Proposal

Nomad should tell me that an image I have deployed is not the latest version available. Preferably with a bell icon and red dot.

This may seem like an unlikely thing for Nomad to solve for, but if not Nomad, what other tool would do it? As best as I can tell, the nomad server agent is the only running process that could have knowledge of the running image versions at any given time as well as have access to check the repository for a more current version.

Use-cases

Security.

Attempted Solutions

None within the scope of nomad. I'm thinking of manhandling those job files and parsing out the current versions and then notifying myself somehow.

tgross commented 2 years ago

I love the idea of this kind of thing, but has some architectural complications...

As best as I can tell, the nomad server agent is the only running process that could have knowledge of the running image versions at any given time as well as have access to check the repository for a more current version.

As it turns out, the server has no idea what a Docker image is: anything that falls in the task.config is handled by the task driver on the client (which could be a third-party driver and not inside Nomad at all!). Even the client doesn't really know what the schema of the config is. Also, the client itself runs as root on the host, so we probably don't want to make it in charge of making third-party scanning requests.

That being said, the Job and Allocation APIs are readable by any application with the appropriate Nomad ACL token. So a potentially interesting idea here is to run a job on the cluster that has minimally scoped Nomad privileges (read-only on the Job/Allocation) and have it periodically scan the set of allocations, get their task.config.image (or other artifacts for drivers like qemu!) and then send those results to whatever third-party scanning service we'd want to use.

josh-m-sharpe commented 2 years ago

As an aside, do such "third party scanning" tools exist? The only thing in this realm I'm familiar with is github's Dependabot - but thats not an API data could be sent to to be scanned.

tgross commented 2 years ago

Without recommending any one in particular, I know that Docker has an integration with Synk for their registry. See https://docs.docker.com/develop/scan-images/#scan-using-docker-hub for example.

dasavick commented 2 years ago

I really like the idea. However, without hardcore version pinning (specific commit/patch) practiced by nomad cluster operator, scanning for task.config.image would be of limited to no usability.

This is actually not a problem of no version pinning at all: using major versions like redis:7 can put us severely behind update schedule when a job is running uninterrupted for a long time, while potentially giving a false sense of security and being up-to-date.

I feel like scanning for the string in the job files is more for tools like previously mentioned dependabot. I don't think there is anything that would prevent that, and it does not give a false sense of runtime up-to-dateness.


That said, docker stores runtime image version as digest (docker inspect mycontainer) in Config.Image and that does not help, I don't think there is a way to reverse the digest as many tags can have the same one.

I don't see how that would allow obtaining specific version distance easily, but comparing that to the digest of current pull of task.config.image would at least allow an equality check to be made resulting in "there is some new version" information.. a bit underwhelming, but at least something.


It also came to my attention that nomad cannot redeploy currently running jobs with unchanged jobspec (same image tag), which complicates things further (#1576, #2038) for any external tool to do that without some hacky env/meta changes (#698, #3949). I looked into force_pull but it does not seem to do anything for re-submit of already running jobs. Updates from the UI/in general would be even better if there was image prefetch (#6380) available.

tgross commented 2 years ago

That's all an excellent point @PinkLolicorn: basically it comes down to Docker image tags aren't immutable (and are frequently mutated!).

That said, docker stores runtime image version as digest (docker inspect mycontainer) in Config.Image and that does not help, I don't think there is a way to reverse the digest as many tags can have the same one.

I think that's ok... the Registry API does support query by manifest reference for the Detail API. That still leaves the matter of an API for the scans themselves, which I don't see published anywhere.

It also came to my attention that nomad cannot redeploy currently running jobs with unchanged jobspec (same image tag)

Yeah, that comes back to the task.config being opaque to the server. Pinning by SHA is probably the way to go if you want to do this kind of thing. It's not super user-friendly if you're deploying via the command line, but if you've got a scenario where you're trying to drive change based on scans, maybe a CI-driven workflow is better anyways?

nierob commented 2 years ago

Actually, I have solved the problem by attacking it from a different angle. Instead of detecting an old image I'm very aggressive with updating. Whenever there is a new version of an image I'm testing it and deploy automatically (well almost).

Inside docker config in nomad spec:

  config {
    image = trimprefix(file(join("/", [ [[.basedir]], "Dockerfile.FROM.only" ])), "FROM ")
  }

I'm using a template, where basedir is just base dir for importing local file and Dockerfile.FROM.only looks like:

FROM foo/bar:latest@sha256:abcabcabcabcabcbacbabcabc

Then dependabot or other tool can update that Dockerfile at will. So one does not get only "detection", but it goes one step further: a pull request. Mark that image digest is used to guarantee immutability. Usage of latest is generally Ok here, because of the digest it is not an "unknown" version. One can use a non generic tag too but it seems that dependabot sometimes ignores it (bug?). The solution depends on a strong, blocking CI. Depending on content of the image, some version updates are not straightforward, for example in case of a DB there could be a data migration, that may require additional steps. YMMV.

ppacher commented 1 year ago

Hi,

we're also trying to figure out a way to get notified when a new container image is available for any task that uses the docker driver. There are already projects like watchtower but it needs to talk to containerd directly which is not really suitable in a Nomad environment. I played around with the Nomad API a bit and think a straightforward solution would be that task-drivers can append custom meta-data to a task allocation (i.e. adding something like DriverInfo to Allocation.TaskStates). This way, the docker/podman driver could append the image hash (which it knows; maybe also the docker container ID) to the task. That would allow external tools that query the Nomad API to immediately know which exact version of an image is used for each allocation and notify/act accordingly.

Right now, one would need a system job that runs on all clients and can talk to the docker daemon and one "control" server that queries the Nomad API to detect which allocations are scheduled on which clients and then try to aggregate the information. While this could work it requires a lot of care to correctly map the container image to the task allocation as multiple tasks using the same docker image might be executed on a client (correctly attributing the Nomad task per container is still possible by parsing the docker-inspect output and extract the nomad allocation ID from the container environment).

What are your thoughts on this? Or is there any other (better) way that I missed?