hashicorp / consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
https://www.hashicorp.com/
Mozilla Public License 2.0
4.76k stars 781 forks source link

/v1/catalog/nodes sometimes returns 0 nodes if there's a leadership change in Consul #1098

Closed sgirones closed 6 years ago

sgirones commented 6 years ago

Consul Template version

consul-template v0.19.4 (68b1da2)

Configuration

consul {
  # the local consul agent
  address = "127.0.0.1:8500"
}

log_level = "trace"
max_stale = "0s"
wait {
  min = "1m"
  max = "3m"
}
kill_signal = "SIGINT"

template {
  source = "/src.ctmpl"
  destination = "/dest"

  error_on_missing_key = true
  backup = true
}
{{ range nodes }}
# node
{{ end }}

Command

/usr/local/bin/consul-template -config=/etc/consul-template/config.d

Debug output

https://gist.github.com/sgirones/74eb8088e7a46833a9a3ed75295754f3

Expected behavior

GET /v1/catalog/nodes should return 132 nodes

Actual behavior

GET /v1/catalog/nodes returns 0 nodes

Steps to reproduce

It happened once during a leadership change and it broke some of our templates. We then tried and managed to reproduce the behaviour, although not consistently. We randomly and constantly kill all the consul servers, giving some seconds between kills so a leader can be elected before the cluster is killed again.

This is the command we run on our 3 consul servers:

while true; do sudo pkill consul$; sleep $(( $RANDOM % 8 + 2 )); done

My intuition says it could be related to not using the consistent mode in the GET to the API: https://www.consul.io/api/index.html#consistent, but I think that mode is not supported in consul-template.

sgirones commented 6 years ago

Closed in favour of https://github.com/hashicorp/consul/issues/4139