Closed cyrilgdn closed 1 year ago
Thanks for the detailed description @cyrilgdn!
I think your analysis make sense. I have placed this issue in our backlog for further triaging.
Hi @cyrilgdn and @lgfa29 I wanted to add some additional feedback on this issue. We just attempted an upgrade from Nomad 1.3.5 to 1.4.8 in production this week and also encountered this behavior in jobs on our cluster with templated Vault secrets, even for jobs with only a single group.
In our case, a change of count
on the single group resulted in all allocations of the job being triggered for restart, regardless of whether the change was an increase or decrease.
We've downgraded our nomad clients back to Nomad 1.3.5 and job count changes stopped restarting all allocations.
I've been wracking my brain at this for a couple of days now until I, mostly by chance, figured out the correlation between restarts and scaling actions. Bit of a shame to have to downgrade again to Nomad 1.3.5. Hopefully, there'll be a solution to this soon 🤞
Hello @lgfa29 @cyrilgdn
I wanted to provide some feedback on this issue. We recently attempted to make a change to the update stanza of a nomad job. However, after applying the change, we noticed that all allocations of the job were triggered for restart, which was unexpected.
Here is the relevant plan:
+/- Job: "test"
+/- Task Group: "nginx" (2 in-place update)
+/- Update {
AutoPromote: "false"
AutoRevert: "true"
Canary: "0"
HealthCheck: "checks"
HealthyDeadline: "300000000000"
MaxParallel: "1"
+/- MinHealthyTime: "6000000000" => "5000000000"
ProgressDeadline: "600000000000"
}
Task: "nginx"
Scheduler dry-run:
- All tasks successfully allocated.
Only by having this change resulted in all allocations of the job being triggered for restart.
Nomad version: 1.5.3
Another example: https://github.com/hashicorp/nomad/issues/17398
We have hit this problem after upgrading to 1.4.11 and all it takes to trigger templates with Vault dynamic credentials to re-render is changing the count
of a job.
Hi folks, and thanks again @cyrilgdn for the detailed write-up!
I was able to reproduce this on our release/1.6.x
branch with this minimal job specification:
job "multi" {
group "one" {
count = 1
task "t-1" {
driver = "docker"
config {
image = "nginx:alpine"
}
# identity{} block not actually required,
# because every task gets an identity.
# this just exposes it to the task.
#identity {
# file = true
#}
template {
destination = "local/stamp"
data = "{{ timestamp }}" # anything that is dynamic
}
}
}
group "two" {
count = 1
task "t-2" {
driver = "docker"
config {
image = "nginx:alpine"
}
}
}
}
Scaling group two
(with nomad job scale multi two 1
) results in task t-1
in group one
being restarted, for precisely the reason described in the OP -- The task gets a new NomadToken
, so the template manager gets recreated, which calls some dynamic function, which results in a changed template and therefore a restart (the default change_mode).
There are a couple of ways to address this
planner
on the server where a new token is initially set could reuse the same token, if possible, instead of making a new one every time.identity_hook
on the client that sets the new token on the task runner (which is later picked up by the template_hook
) could, well, not do that.Personally I'm leaning towards the first option, so that the server and the client would agree on which token is in use, but I'll confer with the team to see if there are implications I'm not considering, or other architectural changes that may be preferable.
Nomad version
We encounter this problem since Nomad 1.4
Issue
On a Nomad job with 2 groups, if a task of the first group has a template with Vault dynamic credentials, the associated allocation will be unexpectedly restarted (actually the template will be re-rendered, then it depends on the template's
change_mode
) if we scale the second group.This is a big issue for us as we use the Nomad autoscaler with
scaling
stanza, so every time it automatically scales a job (with multiple group), allocations that are not linked to the scaling policy are restarted.My 2 cents
After debugging a bit and checking the diff between Nomad 1.3.8 and Nomad 1.4 or above, I believe it could be linked to workload identity and the new
NomadToken
added in https://github.com/hashicorp/nomad/pull/13000/ (cc @tgross)In
template_hook.go
https://github.com/hashicorp/nomad/blob/fc75e9d117f38ad2729fb55a865a296b8517f9ae/client/allocrunner/taskrunner/template_hook.go#L200-L205The update hook now checks that this new token has changed in addition to the Vault one and it seems that this token changes at every scaling event (every deployment I guess), so it restarts
consul-template
manager and as we have dynamic credentials, the rendered template changes and the task is restarted.(I've also tested with static secrets from Vault; in this case the template manager is restarted but as the rendered template is the same than the previous one, the task is not restarted)
Not sure but I hope it could help.
Reproduction steps
We need a Nomad cluster already setup and linked to a Vault cluster with at least one dynamic secrets engine. Nomad version > 1.4.x
We deploy the following Nomad job:
Once the deployment is successful, we have the following allocations:
Then we scale the second group
nginx2
:The deployment is successful and we have our new allocations
But depending on the Nomad client version, we have a different behavior for the allocation of the group
nginx
(not the one that we have scaled)Expected Result
This is the behavior we have if the Nomad client version is <= 1.3.8
The allocation of the first group
nginx
shouldn't have been impacted.This is an example of another deployment of the same job on a Nomad 1.3.8 client:
Actual Result
This is the behavior we have with a Nomad client >= 1.4.0
The allocation of the group
nginx
has been restarted because the template has been re-rendered:Identity token
This is the (decoded) token I got from
secrets/nomad_token
thanks to theidentity
stanza with:Before job scaling
After job scaling:
We can see that
nbf
andiat
changed.Nomad Client logs (if appropriate)