Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
Nomad v0.12.3 (2db8abd9620dd41cb7bfe399551ba0f7824b3f61)
Consul v1.0.1
Operating system and Environment details
CentOS Linux release 7.3.1611 (Core)
Issue
The issue is based on a relatively big delay between nomad client host register a new service for a rolling update, and syncing this service with a local consul which is resulting in a cases where rollout is over and done with no alive services in consul at all.
As it seems from nomad monitor -log-level=TRACE logs (attaching core elements of this logs down the line), _alloc_healthwatcher fires two times for a new allocs on this client and even marks them as healthy, as actual consul.sync entries appears long after this process is done.
We also send an email to nomad-oss-debug@hashicorp.com with a subject "nomad/consul, service registration delay, slack thread" with a full trace-log of a rollout process.
Nomad version
Nomad v0.12.3 (2db8abd9620dd41cb7bfe399551ba0f7824b3f61) Consul v1.0.1
Operating system and Environment details
CentOS Linux release 7.3.1611 (Core)
Issue
The issue is based on a relatively big delay between nomad client host register a new service for a rolling update, and syncing this service with a local consul which is resulting in a cases where rollout is over and done with no alive services in consul at all. As it seems from nomad monitor -log-level=TRACE logs (attaching core elements of this logs down the line), _alloc_healthwatcher fires two times for a new allocs on this client and even marks them as healthy, as actual consul.sync entries appears long after this process is done.
We talked this issue on hangops slack with @nickethier, and agreed on continuing discussion in an issue thread here: https://hangops.slack.com/archives/C1CAYV38T/p1603271878496000
We also send an email to nomad-oss-debug@hashicorp.com with a subject "nomad/consul, service registration delay, slack thread" with a full trace-log of a rollout process.
Reproduction steps
Simple rolling update.
Job file (before env substitution)
Nomad Client logs