Open sheldonkwok opened 7 years ago
Hi @sheldonkwok do you have ACLs enabled? This looks like it might be a bug where it's trying to verify ACL rights on a check that's already deleted (https://github.com/hashicorp/consul/blob/v0.8.1/consul/acl.go#L774-L776). That should probably allow the deregister since the check is already missing. I'm not sure how you'd get into this state though. While you are in the "stuck" state can you post a gist with the /v1/agent/checks output from the agent and the /v1/catalog/health/service/
As a workaround you could set https://www.consul.io/docs/agent/options.html#acl_enforce_version_8 to false
to bypass this particular check while we diagnose and fix.
Hi @slackpad we do have ACLs enabled but it was set to allow as we migrated. We will add the workaround for the old servers before we start the migration again. Will post more info as the bug comes up again. Thanks!
consul version
for both Client and ServerClient: 0.7.5 Server: 0.8.1
consul info
for both Client and ServerClient:
Server:
Operating system and Environment details
Linux 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the Issue (and unexpected/desired result)
We are currently migrating to the new consul version (0.7.X to 0.8X) and are experiencing issues registering health checks with nomad. The jobs only have a serf health check and not the http one that is specified. About five minutes later, the http health checks finally register. Normally the additional health checks associated with the service register immediately.
The issue on nomad repo: https://github.com/hashicorp/nomad/issues/2595#issuecomment-297826414
Log Fragments or Link to gist
Client logs are filled with.