hashicorp / consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
https://www.hashicorp.com/
Mozilla Public License 2.0
4.76k stars 781 forks source link

Consul template with dedup enabled no longer updating after restart of consul-template service #1064

Open haandude opened 6 years ago

haandude commented 6 years ago

We are currently using consul-template v0.18.5

Expected behavior

We have a clustered setup running multiple nodes with consul-template updating haproxy. We recently enabled dedup for consul-template, because the load on consul would get too much. We saw very nice results when using dedup. We use ansible to update configuration for consul template. We expect consul-template to be robust enough to keep updating even after updates or restarts.

Actual behavior

After an ansible run a restart of consul-template was triggered on our cluster. This restart made consul-template stop updating. The consul-template process was still running. After deleting the dedup lock and data variables, consul-template started working again as expected.

Steps to reproduce

  1. Setup cluster with multiple consul-template nodes updating haproxy
  2. Restart the consul-template services
  3. Observe consul-template no longer updating.

References

Configuration

#Ansible managed

consul {

    address = "169.254.255.254:8500"
    token = ""

    retry {
      enabled = true
      attempts = 12
      backoff = "250ms"
      max_backoff = "1m"
    }

}

  # This block defines the configuration for de-duplication mode. Please see the
  # de-duplication mode documentation later in the README for more information
  # on how de-duplication mode operates.
  deduplicate {
    # This enables de-duplication mode. Specifying any other options also enables
    # de-duplication mode.
    enabled = true

    # This is the prefix to the path in Consul's KV store where de-duplication
    # templates will be pre-rendered and stored.
    prefix = "consul-template/dedup/"
  }

max_stale = "10m"
pid_file = "/var/run/consul-template.pid"
log_level = "warn"

syslog {
  enabled = true
  facility = "LOCAL5"
}

wait {
  min = "4s"
  max = "10s"
}

# haproxy
template {
  source = "/data/consul-template/config/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy reload"
  command_timeout = "30s"
  backup = true
}

# domains
template {
  source = "/data/consul-template/config/domains.ctmpl"
  destination = "/etc/haproxy/domains.map"
  command = "service haproxy reload"
  command_timeout = "30s"
  backup = true
}

# token
template {
  source = "/data/consul-template/config/tokens.ctmpl"
  destination = "/etc/haproxy/tokens.map"
  command = "service haproxy reload"
  command_timeout = "30s"
  backup = true
}

Command

/usr/local/bin/consul-template -config /data/consul-template/config/consul-template.hcl

Debug output

January 30th 2018, 06:00:42.039 <12>1 2018-01-30T06:00:39.101149+01:00 ip-10-253-121-41 kernel - - - [66645.312807] init: consul-template main process (12009) killed by KILL signal
    January 30th 2018, 06:00:41.200 <12>1 2018-01-30T06:00:39.101149+01:00 ip-10-253-121-41 kernel - - - [66645.312807] init: consul-template main process (12009) killed by KILL signal
    January 30th 2018, 06:00:35.039 <14>1 2018-01-30T06:00:34.069387+01:00 ip-10-253-121-41 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=None state=restarted sleep=None arguments= runlevel=default
    January 30th 2018, 06:00:34.200 <14>1 2018-01-30T06:00:34.069387+01:00 ip-10-253-121-41 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=None state=restarted sleep=None arguments= runlevel=default
    January 30th 2018, 06:00:29.037 <14>1 2018-01-30T06:00:28.058325+01:00 ip-10-253-121-41 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=True state=started sleep=None arguments= runlevel=default
    January 30th 2018, 06:00:28.198 <14>1 2018-01-30T06:00:28.058325+01:00 ip-10-253-121-41 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=True state=started sleep=None arguments= runlevel=default
    January 30th 2018, 06:00:28.198 <14>1 2018-01-30T06:00:27.239368+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/usr/local/bin/consul-template owner=root follow=False group=root unsafe_writes=None state=None content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None dest=/usr/local/bin/consul-template selevel=None original_basename=consul-template regexp=None validate=None src=consul-template seuser=None recurse=False delimiter=None mode=0555 backup=None
    January 30th 2018, 06:00:28.036 <14>1 2018-01-30T06:00:27.239368+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/usr/local/bin/consul-template owner=root follow=False group=root unsafe_writes=None state=None content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None dest=/usr/local/bin/consul-template selevel=None original_basename=consul-template regexp=None validate=None src=consul-template seuser=None recurse=False delimiter=None mode=0555 backup=None
    January 30th 2018, 06:00:27.198 <14>1 2018-01-30T06:00:26.350031+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/tokens.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:27.198 <14>1 2018-01-30T06:00:26.587335+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/tokens.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/tokens.ctmpl selevel=None original_basename=tokens.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:27.198 <14>1 2018-01-30T06:00:26.960504+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/usr/local/bin/consul-template checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:27.036 <14>1 2018-01-30T06:00:26.350031+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/tokens.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:27.036 <14>1 2018-01-30T06:00:26.960504+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/usr/local/bin/consul-template checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:27.036 <14>1 2018-01-30T06:00:26.587335+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/tokens.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/tokens.ctmpl selevel=None original_basename=tokens.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:27.036 <14>1 2018-01-30T06:00:26.067785+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/domains.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/domains.ctmpl selevel=None original_basename=domains.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:26.197 <14>1 2018-01-30T06:00:25.383234+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/haproxy.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:26.197 <14>1 2018-01-30T06:00:26.067785+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/domains.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/domains.ctmpl selevel=None original_basename=domains.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:26.197 <14>1 2018-01-30T06:00:25.863213+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/domains.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:26.197 <14>1 2018-01-30T06:00:25.587593+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/haproxy.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/haproxy.ctmpl selevel=None original_basename=haproxy.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:26.036 <14>1 2018-01-30T06:00:25.383234+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/haproxy.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:26.036 <14>1 2018-01-30T06:00:25.863213+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/domains.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:26.036 <14>1 2018-01-30T06:00:25.587593+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/haproxy.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/haproxy.ctmpl selevel=None original_basename=haproxy.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 06:00:25.197 <14>1 2018-01-30T06:00:24.464352+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config owner=root follow=False group=root unsafe_writes=None state=directory content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=True delimiter=None mode=488 backup=None
    January 30th 2018, 06:00:25.197 <14>1 2018-01-30T06:00:24.987560+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/consul-template.hcl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/consul-template.hcl selevel=None original_basename=consul-template.hcl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=488 backup=None
    January 30th 2018, 06:00:25.197 <14>1 2018-01-30T06:00:24.819840+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/consul-template.hcl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:25.036 <14>1 2018-01-30T06:00:24.819840+01:00 ip-10-253-121-41 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/consul-template.hcl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 06:00:25.036 <14>1 2018-01-30T06:00:24.987560+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/consul-template.hcl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/consul-template.hcl selevel=None original_basename=consul-template.hcl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=488 backup=None
    January 30th 2018, 06:00:25.036 <14>1 2018-01-30T06:00:24.464352+01:00 ip-10-253-121-41 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config owner=root follow=False group=root unsafe_writes=None state=directory content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=True delimiter=None mode=488 backup=None
    January 30th 2018, 05:59:18.916 <12>1 2018-01-30T05:59:16.408285+01:00 ip-10-253-121-5 kernel - - - [66271.535646] init: consul-template main process (14806) killed by KILL signal
    January 30th 2018, 05:59:18.916 <171>1 2018-01-30T05:59:15.404428+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/705942942c51358668f17f239830bb9f/data': Unexpected response code: 500 (rpc error: timed out enqueuing operation)
    January 30th 2018, 05:59:18.916 <171>1 2018-01-30T05:59:15.404428+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/705942942c51358668f17f239830bb9f/data': Unexpected response code: 500 (rpc error: timed out enqueuing operation)
    January 30th 2018, 05:59:18.916 <12>1 2018-01-30T05:59:16.408285+01:00 ip-10-253-121-5 kernel - - - [66271.535646] init: consul-template main process (14806) killed by KILL signal
    January 30th 2018, 05:59:11.915 <14>1 2018-01-30T05:59:11.420910+01:00 ip-10-253-121-5 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=None state=restarted sleep=None arguments= runlevel=default
    January 30th 2018, 05:59:11.915 <14>1 2018-01-30T05:59:11.420910+01:00 ip-10-253-121-5 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=None state=restarted sleep=None arguments= runlevel=default
    January 30th 2018, 05:59:05.913 <14>1 2018-01-30T05:59:04.967596+01:00 ip-10-253-121-5 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=True state=started sleep=None arguments= runlevel=default
    January 30th 2018, 05:59:05.913 <14>1 2018-01-30T05:59:04.967596+01:00 ip-10-253-121-5 ansible-service - - -  Invoked with name=consul-template pattern=None enabled=True state=started sleep=None arguments= runlevel=default
    January 30th 2018, 05:59:04.913 <14>1 2018-01-30T05:59:04.111357+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/usr/local/bin/consul-template owner=root follow=False group=root unsafe_writes=None state=None content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None dest=/usr/local/bin/consul-template selevel=None original_basename=consul-template regexp=None validate=None src=consul-template seuser=None recurse=False delimiter=None mode=0555 backup=None
    January 30th 2018, 05:59:04.913 <14>1 2018-01-30T05:59:04.111357+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/usr/local/bin/consul-template owner=root follow=False group=root unsafe_writes=None state=None content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None dest=/usr/local/bin/consul-template selevel=None original_basename=consul-template regexp=None validate=None src=consul-template seuser=None recurse=False delimiter=None mode=0555 backup=None
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:02.973727+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/domains.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/domains.ctmpl selevel=None original_basename=domains.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.840435+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/usr/local/bin/consul-template checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.840435+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/usr/local/bin/consul-template checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.463517+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/tokens.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/tokens.ctmpl selevel=None original_basename=tokens.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:02.973727+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/domains.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/domains.ctmpl selevel=None original_basename=domains.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.257019+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/tokens.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.257019+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/tokens.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:03.913 <14>1 2018-01-30T05:59:03.463517+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/tokens.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/tokens.ctmpl selevel=None original_basename=tokens.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.466162+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/haproxy.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/haproxy.ctmpl selevel=None original_basename=haproxy.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.209470+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/haproxy.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.766281+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/domains.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.209470+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/haproxy.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.766281+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/domains.ctmpl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:02.912 <14>1 2018-01-30T05:59:02.466162+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/haproxy.ctmpl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/haproxy.ctmpl selevel=None original_basename=haproxy.ctmpl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=420 backup=None
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.060419+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config owner=root follow=False group=root unsafe_writes=None state=directory content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=True delimiter=None mode=488 backup=None
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.407526+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/consul-template.hcl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.591027+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/consul-template.hcl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/consul-template.hcl selevel=None original_basename=consul-template.hcl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=488 backup=None
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.060419+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config owner=root follow=False group=root unsafe_writes=None state=directory content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=True delimiter=None mode=488 backup=None
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.407526+01:00 ip-10-253-121-5 ansible-stat - - -  Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/data/consul-template/config/consul-template.hcl checksum_algo=sha1 follow=False get_md5=False
    January 30th 2018, 05:59:01.912 <14>1 2018-01-30T05:59:01.591027+01:00 ip-10-253-121-5 ansible-file - - -  Invoked with directory_mode=None force=False remote_src=None path=/data/consul-template/config/consul-template.hcl owner=root follow=True group=root unsafe_writes=None serole=None content=NOT_LOGGING_PARAMETER state=None diff_peek=None setype=None dest=/data/consul-template/config/consul-template.hcl selevel=None original_basename=consul-template.hcl.j2 regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=488 backup=None
    January 30th 2018, 05:58:45.907 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)
    January 30th 2018, 05:58:45.906 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)
    January 30th 2018, 05:57:52.359 <172>1 2018-01-30T05:57:51.510333+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:52.359 <12>1 2018-01-30T05:57:51.517435+01:00 ip-10-253-112-182 kernel - - - [17341036.064540] init: consul-template main process (8082) killed by KILL signal
    January 30th 2018, 05:57:52.358 <12>1 2018-01-30T05:57:51.517435+01:00 ip-10-253-112-182 kernel - - - [17341036.064540] init: consul-template main process (8082) killed by KILL signal
    January 30th 2018, 05:57:52.358 <172>1 2018-01-30T05:57:51.510333+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.448645+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.810770+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.958212+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:51.122649+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.810770+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.958212+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:51.122649+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.690073+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:51.340280+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster
    January 30th 2018, 05:57:51.358 <172>1 2018-01-30T05:57:50.448645+01:00 ip-10-253-112-182 consul-template 8082 - -  (runner) watching 177 dependencies - watching this many dependencies could DDoS your consul cluster

Are there any other GitHub issues (open or closed) that should be linked here? For example:

slackpad commented 6 years ago

Hi @haandude this log message from consul-template seems like there's maybe something going on with the Consul servers at that time:

    January 30th 2018, 05:58:45.907 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)
    January 30th 2018, 05:58:45.906 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - -  (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)

Were they healthy with a leader at that time?

haandude commented 6 years ago

@slackpad Our consul cluster was healthy at the time. We have investigated this further and found the consul servers were constrained in terms of i/o (specificly writes), which is the cause of write failures. I suspect that after to many sequential write failures consul template stops updating. We have resolved the i/o constraints, and are monitoring if this resolves our issue. Thanks for pointing us in that direction.

In our use case we will set the amount of retires to 0 (unlimited) to avoid temporary issues causing consul-template to stop updating.

I also noticed we are currently ddossing the consul agent by monitoring many dependencies, mentioned in issue #1065. This is because with dedup every update is triggering a POST to the local consul client to update the dedup value in the kv store. For every POST a new connection is created and left in the TIME_WAIT state, this is happening so often that consul-template is not able to open a socket sometimes. Would it be possible for the consul template to reuse the connection to the local consul client or batch or bundle the updates to the dedup value?