hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

Need option for 0 exit code when nomad plan creates/destroys allocations #6589

Open Omar-Khawaja opened 4 years ago

Omar-Khawaja commented 4 years ago

As mentioned in the docs, nomad plan will return an exit code of 1 when allocations will be created or destroyed. We might want to consider a way to let the user have 0 returned in those situations. This can help where nomad plan is being used through CI/CD (so the pipeline doesn't actually see nomad plan as a failure just because allocations are being changed).

jrasell commented 4 years ago

@Omar-Khawaja do you have suggestions or thoughts on the flag name? I can take this issue on, but want to make sure the flag name I use is roughly what is wanted.

tgross commented 4 years ago

I'm not wild about proliferating more flags here when users should be explicitly checking the exit code in the case of things like CI. If you're automating nomad plan, you're going to want to know whether there was a diff or not so that you can follow-through with a nomad run.

That being said there's some precedent for this behavior in other products with terraform plan, where we have --detailed-exitcode as a flag to opt in to the behavior Nomad does by default, albeit with slightly different exit codes.

mikenomitch commented 2 years ago

Going to add a help-wanted flag to this in case anybody wants to add this behavior.

I think basing this behavior off the --detailed-exitcode flag from TF is the way to go.

lgfa29 commented 1 year ago

6867 has additional suggestion for return codes.

next-jesusmanuelnavarro commented 5 months ago

As mentioned in the docs, nomad plan will return an exit code of 1 when allocations will be created or destroyed. We might want to consider a way to let the user have 0 returned in those situations. This can help where nomad plan is being used through CI/CD (so the pipeline doesn't actually see nomad plan as a failure just because allocations are being changed).

Well, I've been recently bitten for the opposite cause.

I'm comfortable dealing with different status codes and I take advantage of nomad plan on pipelines. So (misreading from my side) I counted on using nomad plan's rc 0 to skip a nomad run and rc 1 to schedule it (and other rcs to rise an error). That means that I'm missing changes if they can be done in place without destroying/creating allocations, namely, a change in tags.

Why this here? Because I'm against changing rc meaning (which is easy enough to cope with basically anywhere) and I'd be in favour of adding more granularity to current rc levels.

I.e: this (a "true" no changes plan)...

Job: "[REDACTED]"
Task Group: "[REDACTED]" (2 in-place update)
  Task: "[REDACTED]"
  Task: "[REDACTED]"
  Task: "[REDACTED]"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 64745709
To submit the job with version verification run:

nomad job run -check-index 64745709 [REDACTED].hcl

...is qualitatively different to this (a plan with significant changes):

+/- Job: "[REDACTED]"
+/- Task Group: "[REDACTED]" (1 in-place update)
  +/- Service {
        Address:           ""
        AddressMode:       "auto"
        EnableTagOverride: "false"
        Kind:              ""
        Name:              "[REDACTED]"
        Namespace:         "default"
        OnUpdate:          "require_healthy"
        PortLabel:         ""
        Provider:          "consul"
        TaskName:          ""
    +/- Tags {
      + Tags: "[REDACTED]"
      - Tags: "[REDACTED, a different one]"
        }
      }
      Task: "[REDACTED]"
      Task: "[REDACTED]"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 64746003
To submit the job with version verification run:

nomad job run -check-index 64746003 [REDACTED].hcl

I wouldn't want to redeploy the former but certainly yes the later and yet these both plans will return the same 0 value.

PS: Now I see #6867, there's an interesting note in it:

Returns codes are indeed useless in a CI pipeline. 0 = I'll do nothing 1 = I'll do something 255 = ???

Even that would be somehow reasonable but the truth is that's not even the case; it's more like this: 0 = I'll do nothing. Well, maybe I'll do something after all, who knows. 1 = I'll definitely do something. Any other RC = An error. Except that there are other circumstances that will make fail the deploy, like lacking agents, or storage, or something like that, that may return a 0 -or maybe a 1?