Open haandude opened 6 years ago
Hi @haandude this log message from consul-template seems like there's maybe something going on with the Consul servers at that time:
January 30th 2018, 05:58:45.907 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - - (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)
January 30th 2018, 05:58:45.906 <171>1 2018-01-30T05:58:45.382306+01:00 ip-10-253-121-5 consul-template 14806 - - (runner) failed to update dependency data for de-duplication: failed to write 'consul-template/dedup/0f940984a4eaef41f72aa4582017b4d8/data': Unexpected response code: 500 (rpc error: rpc error: timed out enqueuing operation)
Were they healthy with a leader at that time?
@slackpad Our consul cluster was healthy at the time. We have investigated this further and found the consul servers were constrained in terms of i/o (specificly writes), which is the cause of write failures. I suspect that after to many sequential write failures consul template stops updating. We have resolved the i/o constraints, and are monitoring if this resolves our issue. Thanks for pointing us in that direction.
In our use case we will set the amount of retires to 0 (unlimited) to avoid temporary issues causing consul-template to stop updating.
I also noticed we are currently ddossing the consul agent by monitoring many dependencies, mentioned in issue #1065. This is because with dedup every update is triggering a POST to the local consul client to update the dedup value in the kv store. For every POST a new connection is created and left in the TIME_WAIT state, this is happening so often that consul-template is not able to open a socket sometimes. Would it be possible for the consul template to reuse the connection to the local consul client or batch or bundle the updates to the dedup value?
We are currently using consul-template v0.18.5
Expected behavior
We have a clustered setup running multiple nodes with consul-template updating haproxy. We recently enabled dedup for consul-template, because the load on consul would get too much. We saw very nice results when using dedup. We use ansible to update configuration for consul template. We expect consul-template to be robust enough to keep updating even after updates or restarts.
Actual behavior
After an ansible run a restart of consul-template was triggered on our cluster. This restart made consul-template stop updating. The consul-template process was still running. After deleting the dedup lock and data variables, consul-template started working again as expected.
Steps to reproduce
References
Configuration
Command
Debug output
Are there any other GitHub issues (open or closed) that should be linked here? For example: