hashicorp / consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
https://www.hashicorp.com/
Mozilla Public License 2.0
4.76k stars 781 forks source link

Consul Template dry is producing a different result than not-dry #1204

Open adamrmcclain opened 5 years ago

adamrmcclain commented 5 years ago

Please note that the Consul Template issue tracker is reserved for bug reports and enhancements. For general usage questions, please use the Consul mailing list:

https://groups.google.com/forum/#!forum/consul-tool

Consul Template version

consul-template v0.19.5 (57b6c71)

Configuration

(I have removed the commented out lines here for readability)


consul {
  address = "127.0.0.1:8500"
  retry {

    enabled = true
    attempts = 0
    backoff = "250ms"
    max_backoff = "30s"
  }
}

log_level = "info"

template { source = "/webApps/nginx/conf/env/stage/canary/canary_demo.tmpl" destination = "/webApps/nginx/conf/env/stage/canary/canary_demo.rendered" command = "/bin/sh /webApps/nginx/scripts/reloadNginx.sh" perms = 0777 backup = true wait { min = "2s" max = "10s" } }
template { source = "/webApps/nginx/conf/env/stage/upstreams/consul-upstream.tmpl" destination = "/webApps/nginx/conf/env/stage/upstreams/consul-upstream_conf" perms = 0777 backup = true wait { min = "2s" max = "10s" } }
template { source = "/webApps/nginx/conf/env/stage/locations/consul-locations.tmpl" destination = "/webApps/nginx/conf/env/stage/locations/consul-locations_conf" command = "/bin/sh /webApps/nginx/scripts/reloadNginx.sh" perms = 0777 backup = true wait { min = "2s" max = "10s" } }

(I'm only pasting the template that we are having issues with, which is canary_demo.tmpl)

# Copy-paste your Consul Template template here

{{ define "canaryHostMap" }}
  {{- $appName := . }}
  {{- $varSafeAppName := $appName | regexReplaceAll "[^a-zA-Z0-9_]" "_" }}
  {{- if keyExists (printf "canary/apps/%s/groupingStrategy" $appName) }}
    {{- with $appDetails := tree (printf "canary/apps/%s/" $appName) | explode }}
      # appName: {{ $appName }}
      # groupingStrategy: {{ $groupingStrategy := index $appDetails "groupingStrategy" }}{{ $groupingStrategy }}
      # scaleMode: {{ index $appDetails "scaleMode" }}
      map {{ if eq $groupingStrategy "random_user" }}$cookie_aid_3 {{ else }} {{ if eq $groupingStrategy "random_pid" }}$cookie_pid {{ else }}$http_x_correlation_id {{ end }}{{ end }} ${{ $varSafeAppName }}_host {
        {{ range service $appName }}
          {{- $id := .ID }}
          {{- $host := .Address }}
          {{- with $versions := index $appDetails "versions" }}
            {{- with $version := index $versions $id }}
              {{- $groups := index $version "groups" }}
              {{- $weight := index $version "weight" }}
              {{- $isFeatureCandidate := index $version "feature_candidate" }}
              # id:                {{ $id }}
              # traffic %:         {{ $weight }}
              # feature_candidate: {{ $isFeatureCandidate }}
              {{ range ($groups | split ",") }}{{ . }} {{ $host }};
              {{ end }}
            {{ end }}
          {{ end -}}
        {{ else }}# no versions for app{{ end }}
      } {{/* end nginx map directive */}}
    {{ end }}
  {{ else }}
    # appName: {{ $appName }}
    # no registered services for app
    map $http_x_correlation_id ${{ $varSafeAppName }}_host {}
  {{ end }}
{{ end }}
{{/* End 'canaryHostMap' definition */}}

{{/*
  ---------- Everything above this line will be seperated into it's own file ----------
  ----------        Below is what your canary config will look like          ----------
*/}}

{{/* Canary host variable generation */}}
{{ executeTemplate "canaryHostMap" "locator-stage" }}
{{ executeTemplate "canaryHostMap" "item-detail-services-stage" }}

Command

consul-template -template "/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.tmpl:/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.rendered" -once

consul-template -template "/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.tmpl:/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.rendered" -once -dry

Debug output

I think this is the important information

2019/04/05 15:10:42.302273 [DEBUG] (runner) initiating run
2019/04/05 15:10:42.302279 [DEBUG] (runner) checking template 456dec88739dd18c1cbfeecd0585ce56
2019/04/05 15:10:42.302695 [DEBUG] (runner) checking template 13a2ab9ad84269b26f8a668350fd197a
2019/04/05 15:10:42.303096 [DEBUG] (runner) checking template 0ab072388fd1c358ed325191ff5c70e4
2019/04/05 15:10:42.303555 [DEBUG] (runner) rendering "/webApps/nginx/conf/env/stage/locations/consul-locations.tmpl" => "/webApps/nginx/conf/env/stage/locations/consul-locations_conf"
2019/04/05 15:10:42.303607 [DEBUG] (runner) diffing and updating dependencies
2019/04/05 15:10:42.303625 [DEBUG] (runner) kv.list(canary/apps/locator-stage/) is still needed
2019/04/05 15:10:42.303632 [DEBUG] (runner) kv.get(canary/apps/item-detail-services-stage/groupingStrategy) is still needed
2019/04/05 15:10:42.303640 [DEBUG] (runner) catalog.services is still needed
2019/04/05 15:10:42.303646 [DEBUG] (runner) kv.get(picking/location) is still needed
2019/04/05 15:10:42.303662 [DEBUG] (runner) health.service(picking|passing) is still needed
2019/04/05 15:10:42.303678 [DEBUG] (runner) kv.get(ecsb/foundation) is still needed
2019/04/05 15:10:42.303690 [DEBUG] (runner) kv.get(picking/stage/external-uri) is still needed
2019/04/05 15:10:42.303703 [DEBUG] (runner) kv.get(canary/apps/locator-stage/groupingStrategy) is still needed
2019/04/05 15:10:42.303720 [DEBUG] (runner) enabling template-specific quiescence for "0ab072388fd1c358ed325191ff5c70e4"
2019/04/05 15:10:42.303740 [DEBUG] (runner) watching 8 dependencies

What's weird is that the KV values it says "is still needed" exist in the Consul Server KV. For example:

consul kv get canary/apps/locator-stage/groupingStrategy
random_session

Expected behavior

The output from the dry command should match the output to the rendered file

Actual behavior

Dry output:

      # appName: locator-stage

      # groupingStrategy: random_session

      # scaleMode: linear

      map  $http_x_correlation_id  $locator_stage_host {

              # id:                locator-stage-2-37-3

              # traffic %:         1

              # feature_candidate: false

              default locator-stage-2-37-3.kroger.com;

      }

    # appName: item-detail-services-stage

    # no registered services for app

    map $http_x_correlation_id $item_detail_services_stage_host {}

File output:

# appName: locator-stage
      # groupingStrategy: random_session
      # scaleMode: linear
      map  $http_x_correlation_id  $locator_stage_host {
        # no versions for app
      } 

    # appName: item-detail-services-stage
    # no registered services for app
    map $http_x_correlation_id $item_detail_services_stage_host {}

Steps to reproduce

  1. run consul-template -template "/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.tmpl:/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.rendered" -once -dry
  2. run consul-template -template "/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.tmpl:/webApps/nginx/conf/env/$PLATFORM_ENVIRONMENT/canary/canary_demo.rendered" -once

References

Are there any other GitHub issues (open or closed) that should be linked here? For example:

N\A

eikenb commented 5 years ago

Hey @adamrmcclain, thanks for reporting this issue.

I'll try to look into this but a simplified, minimal setup to reproduce this would be very helpful. Particularly reducing that template to use as few as possible consul keys with example values provided. Thanks.

natemccurdy commented 5 years ago

I'm seeing the same behavior with consul-template v0.20.0 (b709612c) Can't think of a way to make this reproducible for troubleshooting efforts though.

In my case, I've got a file (rendered from a template) that is not updating even though the data in Consul would suggest it should be.

A consul-template -template <ctmpl_path>:<file_path> -once -dry produces the correct output. while consul-template -template <ctmpl_path>:<file_path> -once does not seem to update the file with the correct output.

Additionally, if I pipe the output of the -dry command into the file then start the consul-template service, it immediately re-writes the file with the incorrect/stale data.

This template is rather simple and just puts each datanode's IP address into a file, one per line:

{{ range service "datanode" }}{{ .Address }}
{{ end }}

The number of lines in the file on disk does not match the number of lines found when querying Consul or when doing a -dry run:

$ wc -l /file/on/disk
4023 /file/on/disk

$ consul-template -template /path/to/template.ctmpl:/file/on/disk -once -dry | wc -l
6369

$ curl -s -G http://127.0.0.1:8500/v1/health/checks/datanode --data-urlencode 'filter=Status==passing' | jq ". | length"
6369
robbyt commented 5 years ago

It looks like this issue might be related to stale cache/dedupe. I managed to get this working by doing the following:

Workaround:

  1. stop consul-template on node with wrong/unexpected config on disk
  2. consul kv delete -recurse consul-template to delete the consul-template dedupe cache
  3. disable the consul-template deduplicate feature in the local consul-template config
  4. restart consul-template
  5. observe file written to disk with correct data
  6. reenable deduplicate in config and restart consul-template

This problem happened when we scaled up a consul cluster to 6000 instances. So unfortunately, we can't help with a reasonable way to reproduce this issue.

eikenb commented 5 years ago

Thanks for the additional context, particularly for finding a possible culprit @robbyt.

That sounds like it would be a problem with refreshing the dedup'd cache. Though I'm not sure how that plays with how -dry prevented the issue as -dry and dedup don't seem to cross logic paths in the code anywhere. I'll keep an eye out though.