Bug: Input var dynamic defaults don't get config-sourcer config

Describe the bug

Currently, config-sourcer plugin invocations that happen as a result of a dynamic default stanza don't get any of the config set with waypoint config source-set.

i.e. If you configure a vault config-sourcer:

~/dev/waypoint/waypoint config source-set -type=vault -config=addr="https://acmecorp.private.vault.aws.hashicorp.cloud:8200" ...

And then add a var with a dynamic default in your waypoint.hcl:

variable "artifactory_password" {
  # default = "foo"
  default = dynamic("vault", {
    path = "config/data/acmecorp/registry"
    key  = "data/artifactory_password"
  })
  type        = string
  description = "artifactory_password"
}

When the vault config-sourcer runs (in a remote or odr runner), it doesn't use the configured address - it defaults to localhost:

2022-03-18T10:44:56.454-0400 [WARN]  waypoint.runner.agent.runner.watchloop: error retrieving config value: job_id=01FYEQSWX72C5H6NJ9CKA910ZV job_op=*gen.Job_Build source=vault key=artifactory_username err="rpc error: code = Aborted desc = Failed to read from vault. Path: "config/data/acmecorp/dev", err: "Get \"https://127.0.0.1:8200/v1/config/data/acmecorp/dev\": dial tcp 127.0.0.1:8200: connect: connection refused"

Root cause

Here's my initial read on the root cause:

There's a background process, the appconfig watcher, that's responsible for instantiating and configuring the configsourcer plugins.

We're actually starting two config watchers in the runner - one when we start the runner (here), and one as part of loading dynamic variables (here).

As part of RunnerConfigStream, the server sends the runner configsourcer config. The first watcher gets fed those events here, and updates the plugins to have the right config (i.e. restarts the vault plugin with the right config addr).

The second watcher pathway calls UpdateSources.

To fix this, we should probably have just one config watcher. It's a long-lived background process anyway. We could store the initial watcher on the runner, and re-use it to render dynamic variable defaults. I haven't dug into if that would cause other weird side-effects though.

We could instead patch this fast by just calling GetConfigSource to fetch the config-sourcer config from the server after starting the variables watcher, and immediately call UpdateSources.

Good bug find. This is confusing because there are supposed to be two different config watchers happening. I'll attempt to explain why, but please ask more questions if this is confusing.

Background

The first config watcher ("first" being the one you mention first in your issue, this one) is responsible for runner configuration. These are the variables (environment variables and files) that are specified for the runner to have. For example, you can use this to set AWS access credentials for all jobs a runner may execute. These are set using the waypoint config set -runner command and flag.

The second config watcher (here) is for the job configuration. These are the variables that are specified in the waypoint.hcl file or via the API for a specific job. These might overwrite the runner configuration, but are meant to be scoped to a single job.

I purposely made two watchers for a couple reasons:

The variables that a job has are different from the variables that a runner has.
Multiple jobs may be executing on a runner (in theory, not in practice at the time of writing)
Future enhancement: we may want -- for permissions reasons -- different jobs/runners to use different config source configs. For example, the runner may get its Vault credentials from one Vault namespace, and jobs may get them from another, since runners are likely higher privilege.

Best Solution (IMO)

I think the best long term solution would be to change the RunnerJobStream stream to send down config source configs (similar to how RunnerConfig does for the runner). These can be sent perhaps even as part of the assignment event, or maybe its a new event.

Either way, the important invariant that must be held is that the config sources must be received before the job attempts to load the variables. My recommendation here would be prior to executeJob is ever called.

Why is this the best?

It gives us the ability to send down job-specific configs in the future.
It avoids intermingling the runner watcher with the job watcher (see future solutions)
It avoids any concurrency within a runner, no locks required since its single-threaded per job
The implementation is very easy.
Con: slightly more bandwidth, repetition

Alternate Solution: Use Runner Config Sources

An alternate solution is to just store the config sources on the runner, and then read them per job and set them up prior to instantiating the appconfig watcher for a job.

Pro: no server changes, all data is already local in-memory
Con: data synchronization (not hard, though)
Con: more limited in future security improvements
Con: runner has to always have access to all config source configs because it doesn't know what jobs need. Right now we do this anyways but an optimization I've always wanted to do is only send down the config sourcer configs for the config sourcers that are actually used.

Alternate Solution: Share a Single Appconfig Watcher

This is what you proposed: the appconfig watcher for the runner can be used for jobs.

I would say this is only safe is the runner executes one job at a time. If different jobs have slightly different variable configurations with the same name, itll be tricky to synchronize them. Practically, especially with ODR, we only execute one job per runner so this practically works, but we've always architected runners to be able to take multiple jobs concurrently so theoretically this is unsafe and challenging.

I would recommend against this. I think even from an implementation standpoint, either of the other two proposed solutions I can think of would resolve this more easily with less pitfalls and more future strength.

hashicorp / waypoint