hashicorp / waypoint

A tool to build, deploy, and release any application on any platform.
https://waypointproject.io
Other
4.76k stars 326 forks source link

Bug: Input var dynamic defaults don't get config-sourcer config #3114

Closed izaaklauer closed 2 years ago

izaaklauer commented 2 years ago

Describe the bug

Currently, config-sourcer plugin invocations that happen as a result of a dynamic default stanza don't get any of the config set with waypoint config source-set.

i.e. If you configure a vault config-sourcer:

~/dev/waypoint/waypoint config source-set -type=vault -config=addr="https://acmecorp.private.vault.aws.hashicorp.cloud:8200" ...

And then add a var with a dynamic default in your waypoint.hcl:

variable "artifactory_password" {
  # default = "foo"
  default = dynamic("vault", {
    path = "config/data/acmecorp/registry"
    key  = "data/artifactory_password"
  })
  type        = string
  description = "artifactory_password"
}

When the vault config-sourcer runs (in a remote or odr runner), it doesn't use the configured address - it defaults to localhost:

2022-03-18T10:44:56.454-0400 [WARN]  waypoint.runner.agent.runner.watchloop: error retrieving config value: job_id=01FYEQSWX72C5H6NJ9CKA910ZV job_op=*gen.Job_Build source=vault key=artifactory_username err="rpc error: code = Aborted desc = Failed to read from vault. Path: "config/data/acmecorp/dev", err: "Get \"https://127.0.0.1:8200/v1/config/data/acmecorp/dev\": dial tcp 127.0.0.1:8200: connect: connection refused"

Root cause

Here's my initial read on the root cause:

There's a background process, the appconfig watcher, that's responsible for instantiating and configuring the configsourcer plugins.

We're actually starting two config watchers in the runner - one when we start the runner (here), and one as part of loading dynamic variables (here).

As part of RunnerConfigStream, the server sends the runner configsourcer config. The first watcher gets fed those events here, and updates the plugins to have the right config (i.e. restarts the vault plugin with the right config addr).

The second watcher pathway calls UpdateSources.

To fix this, we should probably have just one config watcher. It's a long-lived background process anyway. We could store the initial watcher on the runner, and re-use it to render dynamic variable defaults. I haven't dug into if that would cause other weird side-effects though.

We could instead patch this fast by just calling GetConfigSource to fetch the config-sourcer config from the server after starting the variables watcher, and immediately call UpdateSources.

mitchellh commented 2 years ago

Good bug find. This is confusing because there are supposed to be two different config watchers happening. I'll attempt to explain why, but please ask more questions if this is confusing.

Background

The first config watcher ("first" being the one you mention first in your issue, this one) is responsible for runner configuration. These are the variables (environment variables and files) that are specified for the runner to have. For example, you can use this to set AWS access credentials for all jobs a runner may execute. These are set using the waypoint config set -runner command and flag.

The second config watcher (here) is for the job configuration. These are the variables that are specified in the waypoint.hcl file or via the API for a specific job. These might overwrite the runner configuration, but are meant to be scoped to a single job.

I purposely made two watchers for a couple reasons:

Best Solution (IMO)

I think the best long term solution would be to change the RunnerJobStream stream to send down config source configs (similar to how RunnerConfig does for the runner). These can be sent perhaps even as part of the assignment event, or maybe its a new event.

Either way, the important invariant that must be held is that the config sources must be received before the job attempts to load the variables. My recommendation here would be prior to executeJob is ever called.

Why is this the best?

Alternate Solution: Use Runner Config Sources

An alternate solution is to just store the config sources on the runner, and then read them per job and set them up prior to instantiating the appconfig watcher for a job.

Alternate Solution: Share a Single Appconfig Watcher

This is what you proposed: the appconfig watcher for the runner can be used for jobs.

I would say this is only safe is the runner executes one job at a time. If different jobs have slightly different variable configurations with the same name, itll be tricky to synchronize them. Practically, especially with ODR, we only execute one job per runner so this practically works, but we've always architected runners to be able to take multiple jobs concurrently so theoretically this is unsafe and challenging.

I would recommend against this. I think even from an implementation standpoint, either of the other two proposed solutions I can think of would resolve this more easily with less pitfalls and more future strength.