Closed chenjpu closed 2 months ago
Service registration provider defaults to consul
, the cluster environment does not rely on the consul environment, when configured as nomad
, the problem does not appear
The following is the configuration of the service template, daprd task restart operation is ok, only when not set provider app task restart no response
job "xxxxx" {
datacenters = ["dc1"]
type = "service"
group "service" {
task "app" {
driver = "docker"
config {
image = "alpine:3.19"
command = "local/app"
}
service {
name = "${NOMAD_JOB_NAME}"
port = "app"
address_mode = "host"
provider = "nomad" // Correct configuration
check {
name = "health check"
type = "tcp"
port = "app"
interval = "12s"
timeout = "6s"
check_restart {
limit = 3
grace = "10s"
}
}
check {
name = "ready check"
type = "http"
port = "http"
path = "/v1.0/healthz"
interval = "12s"
timeout = "6s"
on_update = "ignore"
}
}
artifact {
source = "..../app.tar.gz"
}
}
task "daprd" {
lifecycle {
hook = "poststart"
sidecar = true
}
driver = "docker"
config {
image = "alpine:3.19"
command = "local/daprd"
}
artifact {
source = ".../daprd_min_linux_${attr.cpu.arch}.tar.gz"
}
}
}
}
Hi @chenjpu! Apologies for the delay in responding to this. Let me verify I understand what you're saying here:
service.provider
field is unset, the workload runs but the app
task will not restart as expected.service.provider = "nomad"
, the workload runs and the app
task will restart as expected.Is that right?
I would not have expected the workload to run at all with service.provider
unset (defaulting to Consul) if there's no Consul in your environment. Nomad adds a constraint
that requires Consul if you've got a Consul service in the jobspec.
provider = ""
For a little long time, remember when this configuration was an empty string that caused an exception
Hi @chenjpu!
I think what I'm not making clear is that if you had provider = ""
in your jobspec without Consul available, the job would not start at all. See this example jobspec:
I get a scheduling error like the following:
$ nomad job plan example.nomad.hcl
+ Job: "example"
+ Task Group: "group" (1 create)
+ Task: "task" (forces create)
Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "group" (failed to place 1 allocation):
* Constraint "${attr.consul.version} semver >= 1.8.0": 1 nodes excluded by filter
Job Modify Index: 0
To submit the job with version verification run:
nomad job run -check-index 0 example.nomad.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
However, I suspect the provider
is irrelevant here and that there's something else going on.
We emit the "User requested task to restart" event just before we actually try to restart the task (ref lifecycle.go#L81-L82
), because it can take a while for the task to actually shut down. We wait for "prekill" behaviors and the task itself before we return any errors to the caller.
So there might be something that's blocking the shutdown here.
Next steps to debug:
Hi @tgross
I just reset provider= "", does prompt service 1 unplaced
error. due to the environment in which the scene appears is production
Environment, sorry to not recover the wrong scene,
@chenjpu ok I understand.
If this happens again, you can capture the logs of the running client with: nomad monitor -log-level=DEBUG -node-id=$node_id
. It might also be helpful to capture the goroutine stack by making a request to the client agent's HTTP endpoint at /debug/pprof/goroutine?debug=2
.
OK, I am happy to help with this problem
Doing a little issue cleanup. Going to close this out as unable to reproduce.
Nomad version
1.7.6 or main branch
Operating system and Environment details
CentOS Linux release 7.9.2009 (Core) Docker Engine - 24.0.1
Issue
Multiple attempts to restart the task showed no response
Reproduction steps
Expected Result
Actual Result
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)