hashicorp / consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
https://www.hashicorp.com/
Mozilla Public License 2.0
4.76k stars 782 forks source link

consul-template:master spins with multiple templates, quiescence timers #1427

Open jjjordan opened 3 years ago

jjjordan commented 3 years ago

Consul Template version

master branch

Configuration

consul {
    address = "127.0.0.1:8500"
}

log_level = "trace"

template {
    contents = "{{ keyOrDefault \"some-value\" \"\" }}"
    destination = "./test.out"
    wait {
        min = "1s"
        max = "5s"
    }
}

template {
    contents = "{{ keyOrDefault \"some-other-value\" \"\" }}"
    destination = "./test2.out"
    wait {
        min = "1s"
        max = "5s"
    }
}

I am running consul in development server mode from a docker container. KV data isn't important but we could start with this from consul kv export:

[
    {
        "key": "some-other-value",
        "flags": 0,
        "value": "NDQ="
    },
    {
        "key": "some-value",
        "flags": 0,
        "value": "MjQ="
    }
]

Steps to reproduce

  1. Start fresh consul dev server and consul kv import KV data provided above.
  2. Run consul-template -config config.hcl with the config provided above.
  3. Use the consul KV UI to modify either of the values above.
  4. Observe the trace output below

Debug output

https://gist.github.com/jjjordan/345102204e2dd85396cd554c07fa1ad8

Expected behavior

consul-template regenerates templates once in response to a single change.

Actual behavior

consul-template regenerates templates every second, in accordance with the min quiescence timer

References

None I'm aware of

eikenb commented 3 years ago

Thanks for the report @jjjordan

Just wanted to say thanks and let you know I've reproduced this and am working on it.

eikenb commented 3 years ago

This is not limited to master, every release back to 0.21 at least has this issue. Couldn't test further back easily do to modules/vendor changes.

eikenb commented 3 years ago

I'm beginning to think this is by design. That with quiescence enabled it has to spin every time the wait's timeout to check for updates as I don't think it has any means of telling if an update had occurred in the meantime or not.

Assuming I'm correct then this is a performance bug or optimization and not a behavioral issue. That is, it is doing the right thing just not very efficiently.

I'm going to sleep on this and look at it again tomorrow, and if I stick with my conclusion I'll leave fixing this until post hashicat refactor as it has it's own quiescence implementation that might not have this issue (I'd need to check) and if it does it has a lot more information about what has been updated so a more efficient fix might be an option there.

eikenb commented 3 years ago

I looked over this today and I didn't see a simple way to optimize that so it wouldn't spin w/ >1 templates + quiescence. Summary w/ 2 templates both with waits (quiescence) set.

  1. on the initial run both templates (t1 and t2) have their wait timers set
  2. timer runs out on t1, loops through all templates and loops back around resetting the wait on t2
  3. same as step 2 but for t2
  4. same as step 2 (for t1 again)
  5. ...

The "obvious" fix is to have it only run process the template whose wait timer expired. But this would require a pretty major refactor that would break the library usage of CT. Clearing other timers when one goes off doesn't work as the other timers could be on different cadences.

I'm going to re-classify this as a optimization/enhancement as the behavior is correct, it just wastes CPU.

I do think that the a fix will be doable along with the hashicat refactor.

lattwood commented 3 months ago

@eikenb just stumbled onto this issue tracking down some some spurious logging in Nomad.

The downstream impact of this is (effectively) making the debug log level on Nomad useless. :(