hashicorp / consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
https://www.hashicorp.com/
Mozilla Public License 2.0
4.76k stars 783 forks source link

After receiving SIGHUP consul-template starts to use more sockets (file descriptors) #1067

Open legopost opened 6 years ago

legopost commented 6 years ago

Consul Template version

consul-template v0.19.4 (68b1da2)

Configuration

consul {
  address = "127.0.0.1:8500"
  retry {
    enabled = true
    attempts = -1
    backoff = "2s"
  }
}

max_stale = "2m"
log_level = "warn"
template {
  source = "/etc/octopus/upstream/ztestweb500-zcu-backend.ctpl"
  destination = "/etc/nginx/sites-enabled/consul/ztestweb500-zcu-backend"
  command = "sudo service nginx reload"
  perms = 0644
  left_delimiter  = "{:"
  right_delimiter = ":}"
}

Command

/usr/local/bin/consul-template -config /etc/octopus/consul-template

Expected behavior

The number of sockets (file descriptors) used by the consul-template must remain the same after receiving SIGHUP.

Actual behavior

The number of sockets (file descriptors) used by the consul-template increases irreversible after each SIGHUP. And sooner or later leads to run out of all ephemeral ports on the local node.

Steps to reproduce

  1. We have 1784 consul-template config files with 1919 dependencies in total
  2. while true; do date; netstat -an4| grep "127.0.0.1:8500"| wc -l; ls -la /proc/$(pidof consul-template)/fd/ | wc -l; kill -HUP $(pidof consul-template) ;sleep 240; done

References

eikenb commented 5 years ago

Hello @legopost, thanks for taking the time to submit.

Unfortunately I am unable to reproduce the issue. While the number of connections does go up briefly when the HUP is received, it goes back down to normal levels over time as the blocking http calls timeout/finish. Using your basic config (modified to work with a test template) and script it outputs repeating instances of...

Thu 13 Jun 2019 03:46:45 PM PDT 3 8 Thu 13 Jun 2019 03:50:45 PM PDT 3 8 ...

Are you still seeing this issue? If so, maybe you could try to create a minimal test template to reproduce it as it would seem to have to be related to that.

Thanks.

ziqianggeoffreychen commented 5 years ago

@eikenb How did you configure your consul-template? The main point here is there should be connections between consul and consul-template, then we can monitor an aggregated hung connections. In @legopost report, there are 1784 config files. You don't need to config that many, but I guess 10+ config files is a good choice to reproduce and see an explicit connection hung.

In my system, I met the same issue, and the configuration files are 220 config files:

In a newly started consul-template, the statistic is as below:

$ echo -n "All connections: "; netstat -pan|grep 8500|wc -l; echo -n "ESTABLISHED: "; netstat -pan|grep 8500|grep -c ESTABLISHED; echo -n "WAIT: "; netstat -pan|grep 8500|grep -c WAIT; echo -n "LISTENING: "; netstat -pan|grep 8500|grep -c LISTEN; All connections: 249 ESTABLISHED: 240 WAIT: 8 LISTENING: 1

Now, reload consul-template (rather than restart), we can see the established connections are tripled:

reload() { pid=cat /data/consul_template/pid kill -HUP $pid }

$ systemctl reload rda.consul-template $ echo -n "All connections: "; netstat -pan|grep 8500|wc -l; echo -n "ESTABLISHED: "; netstat -pan|grep 8500|grep -c ESTABLISHED; echo -n "WAIT: "; netstat -pan|grep 8500|grep -c WAIT; echo -n "LISTENING: "; netstat -pan|grep 8500|grep -c LISTEN; All connections: 804 ESTABLISHED: 640 WAIT: 163 LISTENING: 1

After a while, the WAIT connection can decrease to previous level, but the established connections are very stable to a doubled level: $ echo -n "All connections: "; netstat -pan|grep 8500|wc -l; echo -n "ESTABLISHED: "; netstat -pan|grep 8500|grep -c ESTABLISHED; echo -n "WAIT: "; netstat -pan|grep 8500|grep -c WAIT; echo -n "LISTENING: "; netstat -pan|grep 8500|grep -c LISTEN; All connections: 423 ESTABLISHED: 416 WAIT: 6 LISTENING: 1

I can run reload one more time to hung another 200+ connections very easily: $ echo -n "All connections: "; netstat -pan|grep 8500|wc -l; echo -n "ESTABLISHED: "; netstat -pan|grep 8500|grep -c ESTABLISHED; echo -n "WAIT: "; netstat -pan|grep 8500|grep -c WAIT; echo -n "LISTENING: "; netstat -pan|grep 8500|grep -c LISTEN; All connections: 1011 ESTABLISHED: 820 WAIT: 191 LISTENING: 1

$ echo -n "All connections: "; netstat -pan|grep 8500|wc -l; echo -n "ESTABLISHED: "; netstat -pan|grep 8500|grep -c ESTABLISHED; echo -n "WAIT: "; netstat -pan|grep 8500|grep -c WAIT; echo -n "LISTENING: "; netstat -pan|grep 8500|grep -c LISTEN; All connections: 661 ESTABLISHED: 648 WAIT: 12 LISTENING: 1

240 -> 416 -> 648. I believe the hung process is clear now.