hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.76k stars 1.94k forks source link

parallelize docker image downloads on all clients regardless of max_parallel value #11875

Open josh-m-sharpe opened 2 years ago

josh-m-sharpe commented 2 years ago

Proposal

docker driver should pre-download images on all instances before rolling them out. this would speed up deployments.

Use-cases

If a job is configured to roll out with a low max_parallel value, the nomad will process each client in full before moving on to the next one. However, it seems that nomad could start the image download on all clients and run that in parallel without causing harm. This would speed up the overall job.

Attempted Solutions

na - unsupported

lgfa29 commented 2 years ago

However, it seems that nomad could start the image download on all clients and run that in parallel without causing harm.

Hum...maybe I'm missing something, but this doesn't seem to be always true. Docker images do take disk space, so you don't necessarily want to always download images in every node.

In situations that you do, you could have some kind of sysbatch job to preload the images necessary.

josh-m-sharpe commented 2 years ago

Well, yea, I would think we'd only want to pre-download them on nodes where they will be deployed - not every single node. Presumably that is something that tis determined early on in the deployment process?

Seems like creating a separate job for this would be excessive - especially since one would need a job/task like that for every docker-dependent task. I was more thinking this would be something baked into the docker driver.

lgfa29 commented 2 years ago

Ah I see. Yeah, I think this would have to be task driver specific, but also a new functionality to Nomad's task driver API, so kind of pre-start signal. There's also a question of what to do if the update process fails.

I placed this into our backlog for further discussion and investigation. Thank you for the idea 🙂

johnnyplaydrums commented 1 year ago

Sharing another use case for this feature - we have a number of "singleton" services, meaning we can only have one instance running of the service at a time. We have count = 1 and max_parallel = 1 to achieve this. Due to that, there is expected downtime during deploys, but most of that downtime is image download. If we could pre-download the image, it would reduce the downtime to a few seconds instead of a few minutes (it's a large image).

afaik given our constraints (please correct me if wrong) there's no way to accomplish that pre-download of an image using existing functionality like a lifecycle event, b/c even the prestart hook won't start until the existing allocation has been stopped.

theron29 commented 10 months ago

Another use case: Docker on Windows. Windows docker images tend to be pretty big, so image preload would save precious time during deployments.