hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

Add better way to check for canary status on deploy #15156

Open mikenomitch opened 2 years ago

mikenomitch commented 2 years ago

Proposal

When deploying a job that has a canary deployment via CI, it is difficult to know when a canary is ready to be deployed. In non-canary deployments you can just run job run and read the status code, but with canaries, you will hang until the manual deploy is done.

It would be nice to be able to run job run with a canary and it could return a different code if the canary was ready for promotion versus not.

Use-cases

When writing a CI/CD pipeline using Nomad job with canary deploy in bash.

Attempted Solutions

You can detach from job run and then run something that watches for the status of the deployment and promotes it for you if its ready and fails the CI/CD job if not. This is doable but not nice.

lgfa29 commented 1 year ago

After some internal discussions we agreed that it would be best if:

  1. nomad job run stops monitoring the deployment when a manual promotion is required.
  2. nomad job run outputs a message noting the canaries health status and the command users can run to promote them.
  3. Command return code will be 0 unless an error happens.

Returning non-zero values when nothing bad happened has been a source of confusion in other commands so we should avoid that here.

c3pmark commented 6 months ago

This would be a handy feature! I'm using this workaround in CI, in case it's helpful for anyone else. It's a bit of a hack but it doesn't require anything beyond shell tools and the nomad binary.

nomad job run -detach my-job.nomad
# Use Go templates to check each task group and compare the number of healthy canaries to the number desired.  If none of the comparisons output 'false', all of the canaries have been successfully deployed.
while nomad job deployments -latest -t "{{ range .TaskGroups }}{{ eq .DesiredCanaries .HealthyAllocs }}{{ end }}" my-job-name | grep -q false ; do sleep 2 ; done
jblachly commented 4 months ago

Also ran in to this. Another workaround is -- if you are able to access the UI -- a "promote canary" button appears on the task page. Pressing this causes nomad job run CLI monitor to proceed and complete.