Open phanidileep opened 7 years ago
+1
We see this issue in 0.8.5 as well.
I think it is more deep than dns query. For instance:
curl -v -XPUT http://localhost:8500/v1/agent/service/maintenance/consul-agent-http?enable=true
returns 200 OK
curl -v http://localhost:8500/v1/agent/checks
returns correctly the health check related to maintenance:
{"_service_maintenance:consul-agent-http":
{"Node":"consul01-par.central.criteo.preprod","CheckID":"_service_maintenance:consul-agent-http","Name":"Service Maintenance Mode","Status":"critical","Notes":"Maintenance mode is enabled for this service, but no reason was provided. This is a default message.","Output":"","ServiceID":"consul-agent-http","ServiceName":"consul-agent-http","ServiceTags":[],"CreateIndex":0,"ModifyIndex":0}
but curl -v http://localhost:8500/v1/health/checks/consul-agent-http
200 OK with body []
(in this scenario consul-agent-http has no healthcheck defined)
I would have expected to see the same healthcheck called _service_maintenance:consul-agent-http
.
The same behavior can be reproduced with any service with healthcheck defined.
Hi @phanidileep in your example you are doing a node query toolkit-d08wh.node.dc1.com
which isn't affected by the maintenance mode. We should make the documentation more clear, but the maintenance mode prevents that node from coming back in any service queries, since that's where the health check filtering is applied. If toolkit-d08wh.node.dc1.com
was running an instance of the foo
service then toolkit-d08wh.node.dc1.com
would never show up in a query for foo.service.dc1.com
. If you just ask for a node directly then it will be returned, regardless of its health status.
@slackpad Thanks for the clarification. Can you share the list of Status in Consul. will have to check how maintenance mode status in handled in the the Telemetry e.x https://github.com/influxdata/telegraf/tree/master/plugins/inputs/consul
@slackpad I can still see this issue on consul 0.9.3. Is this expected?
@kamaradclimber I think this is a documentation issue, but not an actual code issue. If you ask for a node directly it doesn't factor in the health (or maintenance status) of the node. That is only considered when you are looking up a service over DNS.
i see this same behavior via the catalog endpoint, too.
enable maintenance for a service or node, then searching the catalog for the service includes the node in results. is this intended? is the dns endpoint the only one that doesn't include nodes or services in maintenance mode? i'm seeing this with consul 1.0.2 agents and accessing the various APIs via the diplomat gem and curl.
EDIT: this could probably use some clarification on the docs for the catalog and other endpoints. maybe just explicitly state in docs for each endpoint whether they respect health status. i didn't realize just the dns and health endpoints reflect health state.
though, just a quick test of two lookups for a service that isn't failing shows this, which also feels wrong:
curl -s "$CONSUL_HTTP_ADDR/v1/health/service/foo?passing=true"| jq '.[] | .Checks[0].Status'
"passing"
"passing"
curl -s "$CONSUL_HTTP_ADDR/v1/health/service/foo?passing=false"| jq '.[] | .Checks[0].Status'
"passing"
"passing"
it looks like this message seems to indicate including the "passing" parameter implies either results for nodes/services with non-critical statuses or no defined check in any state. that is also a little confusing.
If you have a question, please direct it to the consul mailing list if it hasn't been addressed in either the FAQ or in one of the Consul Guides.
When filing a bug, please include the following:
consul version
for both Client and ServerClient:
0.7.4
Server:0.7.4
consul info
for both Client and ServerClient:
Server:
Operating system and Environment details
Linux 3.10.0-514.10.2.el7.x86_64
Description of the Issue (and unexpected/desired result)
Based on the document https://www.consul.io/docs/commands/maint.html nodes that are set in Maintenance mode should not be appear in the DNS query. But seems like this is not working as expected.
I am able to ping the Node after setting it in the maintenance mode.
Reproduction steps
sh-4.2# consul maint - enable
sh-4.2# consul maint
Node:
Name: toolkit-d08wh
Reason: Maintenance mode is enabled for this node, but no reason was provided. This is a default message.
sh-4.2# ping toolkit-d08wh.node.dc1.com
PING toolkit-d08wh.node.dc1.com (10.0.33.23) 56(84) bytes of data.
64 bytes from toolkit-d08wh (10.0.33.23): icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from toolkit-d08wh (10.0.33.23): icmp_seq=2 ttl=64 time=0.021 ms
64 bytes from toolkit-d08wh (10.0.33.23): icmp_seq=3 ttl=64 time=0.019 ms
64 bytes from toolkit-d08wh (10.0.33.23): icmp_seq=4 ttl=64 time=0.027 ms
Appreciate you time and suggestions.