Open mblakele opened 6 years ago
This also occurs at my site. The issue is that the healthcheck responses from each service are included in the deduplication body. There are other tickets where services that respond with, for example, a timestamp on the healthcheck page, trigger continuous service reloads despite the actual template contents not changing.
What we did was patch out the healthcheck contents before storing the deduplication body.
Thanks for filing this issue.
If anyone has an idea how to replicate this in a isolated/test setup it would be of great help. Thanks.
This can be replicated in two ways:
Option 1: Very Large Healthcheck Page: For example if you use the /
URL of the target service and it produces the default output for the site. Yes, this is bad form and it's better to have a dedicated healthcheck page, but it will trigger the same failure.
Option 2: Dynamic Healthcheck page -- for example showing the date/time or number of connections served by the process -- will cause copies of multiple checks of the healthcheck page to be stored in the dedup body. Might need a large number of services under check. A few hundred is sufficient, and I would quickly argue that this is not an excessive number of services to be watched for a single template.
Consul Template version
Configuration
Template:
Command
Debug output
https://gist.github.com/mblakele/7efefca93afe7b4e77cbbc886abc7b9c
Expected behavior
What should have happened?
Ideally consul-template should run without errors. Maybe if the data is too large it could be broken up into multiple keys.
Another possible solution would be to ignore dedup if the data is too large, turning this error into a warning.
Failing that, it would be nice if consul-template gave me more details about what's wrong. I'd like to see what's in this chunk of data so I can try to reduce the size.
There's no old value, so I can't look at that. Looking at other values in the kv shows me that it's binary data, and I don't know how to decode or parse it.
Any ideas for resolving this problem?
Incidentally, is there any cleanup process? I seem to have about 250 of these dedup values in consul.
Actual behavior
The consul docs explain that there's a 512-kB limit on values in the kv store: https://www.consul.io/docs/faq.html
Steps to reproduce
Reproducing this probably requires a large cluster. We're seeing it in one environment that has 300 instances registered for various services.
References
N/A