inconsistent output accross multiple runs/renders

netzkind commented 7 years ago

Consul Template version

consul-template v0.19.3 (ebf2d3d) Also happened with 0.14.0

Configuration

# This is the quiescence timers; it defines the minimum and maximum amount of
# time to wait for the cluster to reach a consistent state before rendering a
# template. This is useful to enable in systems that have a lot of flapping,
# because it will reduce the the number of times a template is rendered.
wait {
  min = "2s"
  max = "10s"
}

{{- $colors := "green;blue;yellow" | split ";"}}
{{$services := services}}

{{- range $services}}
  {{- if .Tags | contains "http"}}
    {{- $service_name := .Name -}}
    {{- $save_tags := .Tags -}}
    {{- range $tag, $service := service $service_name| byTag}}
      {{if ($colors | contains $tag)}}
        upstream {{$service_name}}_{{$tag}} {
        {{- range $service}}
          server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1;
        {{- end}}
        }
      {{- end}}
    {{- end}}

    {{/* non-available colors of this service */}}
    {{- range $color := $colors}}
      {{- if $save_tags | contains $color | not}}
           upstream {{$service_name}}_{{$color}} {
             server 127.0.0.1:65535; # force a 502
           }
      {{- end}}
    {{- end}}
  {{- end}}
{{- end}}

upstream server-error {
    server 127.0.0.1:65535; # force a 502
}

server {
  listen 80 default_server;

  charset utf-8;

  access_log /var/log/nginx/access.log upstream_time;

  location / {
    alias /html/;
    index health.html;
  }

  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;

  # send response immediately, don't buffer it
  proxy_buffering off;

{{range $services}}
{{- if .Tags | contains "http" }}

  {{/* see if pool are configured in consul */}}
  {{$pool_color_live := keyOrDefault (printf "/%s/color-live-pool" .Name) "" -}}
  {{$pool_color_last := keyOrDefault (printf "/%s/color-last-pool" .Name) "" -}}
  {{$pool_color_stage := keyOrDefault (printf "/%s/color-stage-pool" .Name) "" -}}

  # -- Live --
  location /{{.Name}}/ {
    rewrite ^/{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_live -}}
      proxy_pass http://{{.Name}}_{{$pool_color_live}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }

  # -- Last --
  location /last-{{.Name}}/ {
    rewrite ^/last-{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_last -}}
      proxy_pass http://{{.Name}}_{{$pool_color_last}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }

  # -- Stage --
  location /stage-{{.Name}}/ {
    rewrite ^/stage-{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_stage -}}
      proxy_pass http://{{.Name}}_{{$pool_color_stage}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }
{{- end}}
{{- end}}
}

Command

consul-template \
  -consul-addr=$CONSUL_URL \
  -log-level trace \
  -config=/etc/consul-template.conf \
  -template="/templates/service.ctmpl:/etc/nginx/conf.d/service.conf:service nginx reload"

Debug output

https://gist.github.com/netzkind/a763e0542865ce05a956074658d265ff (Output is truncated to include startup and the run that caused incorrect results)

Expected behavior

You should see three "upstream"-entries for each service (_blue, _green, _yellow). If a service is not available for all three colors, and upstream-entry with comment "force a 502" should be created instead. Good result: https://gist.github.com/netzkind/7c5703f684681365cc329ad829f46b7e (truncated to only include two services)

Actual behavior

Sometimes, consul-template only generates two, one or no upstream-entry at all for a service. Gist: https://gist.github.com/netzkind/24ff8ebd59820ac623a1b6722e3c410e (Output is truncated to only include two services - in this example "myaccountservice" is the problematic and "search-service" is correct).

This occurs during normal operation. So e.g. the first and second run are ok, the third is bad and the fourth is back to normal.

Steps to reproduce

Unknown, this occurs occasionally.

sethvargo commented 7 years ago

Hi @netzkind

Does it always generate all of the appropriate location blocks? Or is it just upstream that's failing?

netzkind commented 7 years ago

Hi @sethvargo

we have diffed 31 files of various states of completeness, and location-blocks are always there and complete. So it's appearantly only the first part (upstream) that's failing.

sethvargo commented 7 years ago

In the first block, you're using a combination of services and service. The second block only users services from what I can tell. Is it possible that some of these services are becoming unhealthy in Consul? service will only return the list of healthy services (by default).

netzkind commented 7 years ago

That was an assumption we had too, but: some of the services have 4 or more nodes running per color, While we may see health-issues for a select few in the logs of consul-agent, we see no time where all nodes of one color are gone.

To check this, we modified our template and added additional ranges for passing and warning-services (documentation is a little sparse on details here, but warning should give us nodes that are marked "failing" in consul, right?).

{{- $colors := "green;blue;yellow" | split ";"}}
{{$services := services}}

{{- range $services}}
  {{- if .Tags | contains "http"}}
    {{- $service_name := .Name -}}
    {{- $save_tags := .Tags -}}
    {{- range $tag, $service := service $service_name| byTag}}
      {{if ($colors | contains $tag)}}
        upstream {{$service_name}}_{{$tag}} {
        {{- range $service}}
          server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1;
        {{- end}}
        }
      {{- end}}
    {{- end}}
    {{- range $tag, $service := service (printf "%s|passing" $service_name) | byTag}}
        {{- range $service}}
        # service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} healthy
        {{- end}}
    {{- end}}
    {{- range $tag, $service := service (printf "%s|warning" $service_name) | byTag}}
        {{- range $service}}
        # service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} not healthy
        {{- end}}
    {{- end}}

    {{/* non-deployed colors of this service */}}
    {{- range $color := $colors}}
      {{- if $save_tags | contains $color | not}}
           upstream {{$service_name}}_{{$color}} {
             server 127.0.0.1:65535; # force a 502
           }
      {{- end}}
    {{- end}}
  {{- end}}
{{- end}}

upstream server-error {
    server 127.0.0.1:65535; # force a 502
}

server {
  listen 80 default_server;

  charset utf-8;

  access_log /var/log/nginx/access.log upstream_time;

  location / {
    alias /html/;
    index health.html;
  }

  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;

  # send response immediately, don't buffer it
  proxy_buffering off;

{{range $services}}
{{- if .Tags | contains "http" }}

  {{/* see if pool are configured in consul */}}
  {{$pool_color_live := keyOrDefault (printf "/%s/color-live-pool" .Name) "" -}}
  {{$pool_color_last := keyOrDefault (printf "/%s/color-last-pool" .Name) "" -}}
  {{$pool_color_stage := keyOrDefault (printf "/%s/color-stage-pool" .Name) "" -}}

  # -- Live --
  location /{{.Name}}/ {
    rewrite ^/{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_live -}}
      proxy_pass http://{{.Name}}_{{$pool_color_live}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }

  # -- Last --
  location /last-{{.Name}}/ {
    rewrite ^/last-{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_last -}}
      proxy_pass http://{{.Name}}_{{$pool_color_last}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }

  # -- Stage --
  location /stage-{{.Name}}/ {
    rewrite ^/stage-{{.Name}}/(.*)$ /$1 break;

    {{ if $pool_color_stage -}}
      proxy_pass http://{{.Name}}_{{$pool_color_stage}}$uri?$args;
    {{else}}
      proxy_pass http://server-error$uri?$args;
    {{end}}
  }
{{- end}}
{{- end}}
}

We ran this for some time. Result is interesting: when the output is missing the upstream-entries, we are also missing the "healthy"-comment-lines for those services. But we have no "not healthy"-lines instead, so these services seem not to be unhealthy but rather completely absent.

To conclude, we still think it's not a simple health/status-issue.

sethvargo commented 7 years ago

No, "warning" is a special mode in Consul - it's pre-failing and also accounts for maintenance mode. If you use an else clause instead of another query, you can at least see the status of why that node isn't being printed. You can also use "any" as an option to include all health:

    {{- range $tag, $service := service "any" | byTag}}
        {{- range $service}}
        # service {{ $service_name }}_{{$tag}} / {{.Address}}:{{.Port}} {{.Checks.AggregatedStatus}}
        {{- end}}
    {{- end}}

If you run this for a bit, it should tell you the status and hopefully track this down. I'm not running on your infrastructure, so it's rather challenging for me to reproduce this.

Nonetheless, Consul Template is really just a thin wrapper around Consul. If there's flapping for your services like this, it's possibly a bug in Consul too.

netzkind commented 7 years ago

After running a modified version of your snippet for a few minutes, it was obvious that there wer more services marked as critical than we thought. We are currently not running the latest version of consul at the moment, so that might be part of the flapping sensation.

Thank you for all your fast help, and sorry for sending you on a wild-goose chase :(

I think this issue may be closed now.

srihithasri commented 6 years ago

Hi,

Here is my consul-template configuration file..wherer i have 2 webservers running cat /tmp/nginx.conf.template upstream app { {{range service "webserver"}}server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1; {{else}}server 10.xx.xx.xx:65535; # force a 502{{end}} }

server { listen 80 default_server; resolver 10.xxx.xxx.xx; set \$upstream_endpoint http://app;

location / { proxy_pass \$upstream_endpoint; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; } }

but i am getting output like this cat /tmp/nginx.conf upstream app { server 10.211.203.94:65535; # force a 502 }

server { listen 80 default_server; resolver 10.211.203.94; set \$upstream_endpoint http://app;

location / { proxy_pass \$upstream_endpoint; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; } }

consul-template -template "/tmp/nginx.conf.template:/tmp/nginx.conf:/bin/bash -c 'sudo docker restart nginx-lb || true'" -log-level=info 2018/03/30 06:50:29.242226 [INFO] consul-template v0.19.3 (ebf2d3d) 2018/03/30 06:50:29.242250 [INFO] (runner) creating new runner (dry: false, once: false) 2018/03/30 06:50:29.242698 [INFO] (runner) creating watcher 2018/03/30 06:50:29.242857 [INFO] (runner) starting 2018/03/30 06:50:29.242885 [INFO] (runner) initiating run 2018/03/30 06:50:29.245268 [INFO] (runner) initiating run

can some one help me in this?

hashicorp / consul-template