Closed netzkind closed 7 years ago
Hi @netzkind
Does it always generate all of the appropriate location
blocks? Or is it just upstream
that's failing?
Hi @sethvargo
we have diffed 31 files of various states of completeness, and location
-blocks are always there and complete. So it's appearantly only the first part (upstream
) that's failing.
In the first block, you're using a combination of services
and service
. The second block only users services
from what I can tell. Is it possible that some of these services are becoming unhealthy in Consul? service
will only return the list of healthy services (by default).
That was an assumption we had too, but: some of the services have 4 or more nodes running per color, While we may see health-issues for a select few in the logs of consul-agent, we see no time where all nodes of one color are gone.
To check this, we modified our template and added additional ranges for passing
and warning
-services (documentation is a little sparse on details here, but warning
should give us nodes that are marked "failing" in consul, right?).
{{- $colors := "green;blue;yellow" | split ";"}}
{{$services := services}}
{{- range $services}}
{{- if .Tags | contains "http"}}
{{- $service_name := .Name -}}
{{- $save_tags := .Tags -}}
{{- range $tag, $service := service $service_name| byTag}}
{{if ($colors | contains $tag)}}
upstream {{$service_name}}_{{$tag}} {
{{- range $service}}
server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1;
{{- end}}
}
{{- end}}
{{- end}}
{{- range $tag, $service := service (printf "%s|passing" $service_name) | byTag}}
{{- range $service}}
# service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} healthy
{{- end}}
{{- end}}
{{- range $tag, $service := service (printf "%s|warning" $service_name) | byTag}}
{{- range $service}}
# service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} not healthy
{{- end}}
{{- end}}
{{/* non-deployed colors of this service */}}
{{- range $color := $colors}}
{{- if $save_tags | contains $color | not}}
upstream {{$service_name}}_{{$color}} {
server 127.0.0.1:65535; # force a 502
}
{{- end}}
{{- end}}
{{- end}}
{{- end}}
upstream server-error {
server 127.0.0.1:65535; # force a 502
}
server {
listen 80 default_server;
charset utf-8;
access_log /var/log/nginx/access.log upstream_time;
location / {
alias /html/;
index health.html;
}
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# send response immediately, don't buffer it
proxy_buffering off;
{{range $services}}
{{- if .Tags | contains "http" }}
{{/* see if pool are configured in consul */}}
{{$pool_color_live := keyOrDefault (printf "/%s/color-live-pool" .Name) "" -}}
{{$pool_color_last := keyOrDefault (printf "/%s/color-last-pool" .Name) "" -}}
{{$pool_color_stage := keyOrDefault (printf "/%s/color-stage-pool" .Name) "" -}}
# -- Live --
location /{{.Name}}/ {
rewrite ^/{{.Name}}/(.*)$ /$1 break;
{{ if $pool_color_live -}}
proxy_pass http://{{.Name}}_{{$pool_color_live}}$uri?$args;
{{else}}
proxy_pass http://server-error$uri?$args;
{{end}}
}
# -- Last --
location /last-{{.Name}}/ {
rewrite ^/last-{{.Name}}/(.*)$ /$1 break;
{{ if $pool_color_last -}}
proxy_pass http://{{.Name}}_{{$pool_color_last}}$uri?$args;
{{else}}
proxy_pass http://server-error$uri?$args;
{{end}}
}
# -- Stage --
location /stage-{{.Name}}/ {
rewrite ^/stage-{{.Name}}/(.*)$ /$1 break;
{{ if $pool_color_stage -}}
proxy_pass http://{{.Name}}_{{$pool_color_stage}}$uri?$args;
{{else}}
proxy_pass http://server-error$uri?$args;
{{end}}
}
{{- end}}
{{- end}}
}
We ran this for some time. Result is interesting: when the output is missing the upstream-entries, we are also missing the "healthy"-comment-lines for those services. But we have no "not healthy"-lines instead, so these services seem not to be unhealthy but rather completely absent.
To conclude, we still think it's not a simple health/status-issue.
No, "warning" is a special mode in Consul - it's pre-failing and also accounts for maintenance mode. If you use an else
clause instead of another query, you can at least see the status of why that node isn't being printed. You can also use "any"
as an option to include all health:
{{- range $tag, $service := service "any" | byTag}}
{{- range $service}}
# service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} {{.Checks.AggregatedStatus}}
{{- end}}
{{- end}}
If you run this for a bit, it should tell you the status and hopefully track this down. I'm not running on your infrastructure, so it's rather challenging for me to reproduce this.
Nonetheless, Consul Template is really just a thin wrapper around Consul. If there's flapping for your services like this, it's possibly a bug in Consul too.
After running a modified version of your snippet for a few minutes, it was obvious that there wer more services marked as critical than we thought. We are currently not running the latest version of consul at the moment, so that might be part of the flapping sensation.
Thank you for all your fast help, and sorry for sending you on a wild-goose chase :(
I think this issue may be closed now.
Hi,
Here is my consul-template configuration file..wherer i have 2 webservers running cat /tmp/nginx.conf.template upstream app { {{range service "webserver"}}server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1; {{else}}server 10.xx.xx.xx:65535; # force a 502{{end}} }
server { listen 80 default_server; resolver 10.xxx.xxx.xx; set \$upstream_endpoint http://app;
location / { proxy_pass \$upstream_endpoint; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; } }
but i am getting output like this cat /tmp/nginx.conf upstream app { server 10.211.203.94:65535; # force a 502 }
server { listen 80 default_server; resolver 10.211.203.94; set \$upstream_endpoint http://app;
location / { proxy_pass \$upstream_endpoint; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; } }
consul-template -template "/tmp/nginx.conf.template:/tmp/nginx.conf:/bin/bash -c 'sudo docker restart nginx-lb || true'" -log-level=info 2018/03/30 06:50:29.242226 [INFO] consul-template v0.19.3 (ebf2d3d) 2018/03/30 06:50:29.242250 [INFO] (runner) creating new runner (dry: false, once: false) 2018/03/30 06:50:29.242698 [INFO] (runner) creating watcher 2018/03/30 06:50:29.242857 [INFO] (runner) starting 2018/03/30 06:50:29.242885 [INFO] (runner) initiating run 2018/03/30 06:50:29.245268 [INFO] (runner) initiating run
can some one help me in this?
Consul Template version
consul-template v0.19.3 (ebf2d3d) Also happened with 0.14.0
Configuration
Command
Debug output
https://gist.github.com/netzkind/a763e0542865ce05a956074658d265ff (Output is truncated to include startup and the run that caused incorrect results)
Expected behavior
You should see three "upstream"-entries for each service (_blue, _green, _yellow). If a service is not available for all three colors, and upstream-entry with comment "force a 502" should be created instead. Good result: https://gist.github.com/netzkind/7c5703f684681365cc329ad829f46b7e (truncated to only include two services)
Actual behavior
Sometimes, consul-template only generates two, one or no upstream-entry at all for a service. Gist: https://gist.github.com/netzkind/24ff8ebd59820ac623a1b6722e3c410e (Output is truncated to only include two services - in this example "myaccountservice" is the problematic and "search-service" is correct).
This occurs during normal operation. So e.g. the first and second run are ok, the third is bad and the fourth is back to normal.
Steps to reproduce
Unknown, this occurs occasionally.