Open chenjpu opened 1 year ago
Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the service
belonging to a task rather than the group - any chance your job was actually configured that way too?
Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the
service
belonging to a task rather than the group - any chance your job was actually configured that way too?Fix in #16240 Yes, my service is configured on the goup
@shoenig
When I upgraded to 1.5.0-rc1, the above problems still remained
Whelp I guess there was more than one bug then :grimacing:
I'll try and reproduce again with a job structured exactly like yours @chenjpu
I am experiencing a tsunami of these logs with Nomad 1.6.1. It's also generating a large amount of request error logs in Consul.
Example consul log:
/v1/agent/check/register ServiceID "_nomad-task-9ec1786d-e9b1-db33-a381-c748bf033163-group-xxx-xxx-xxx-xxx-xxx-xxx-80" does not exist
This seems to have started with a task that failed to start due to a missing vault secret. It's now generating hundreds of thousands of logs between nomad and consul.
The job itself has two groups, each with different vault policies. Each group contains 1 service, 1 task. Only the second group has a service check, it is this check that is causing all the errors.
The issue causing the task to fail has been resolved, however, the log tsunami persists.
In the short term, is there a workaround to get the logs to stop, short of restoring to an older snapshot?
For others running into this and need a quick way to stop the influx of logs, restart the nomad client.
For us, we ran an instance refresh across the affected autoscaling group.
We're also running into this.
Nomad version
1.4.4
Issue
logs( journalctl -fu nomad )
job.hcl