hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.77k stars 1.94k forks source link

Add Initialize() to drivers.DriverPlugin #10839

Open benbuzbee opened 3 years ago

benbuzbee commented 3 years ago

Proposal

Add a method to drivers.DriverPlugin that is called only once per plugin instance, after SetConfig has been called, and only once Nomad has determined that it will use the driver.

Use-cases

The plugin may need to do something resource intensive (like start background goroutines) that requires config (so cannot be done in the plugin factory function) or that we only want to do in the real plugin process (not the temporary ones that Nomad creates)

For example in our case we want to start a prometheus metrics server, where the plugin config specifies a port, say 1234. We have other use cases for this as well (intensive background goroutines, connecting to containerd, etc)

Attempted Solutions

Using my example case above, a metrics server where the port is in config, lets look at our options today

Do it in the plugin factory functions

Doesn't work because we need SetConfig to be called first

Do it in SetConfig

Doesn't work because nomad calls SetConfig twice, using two separate process ID's, once in Dispense (which is the one we want) but also once in validatePluginConfig

Not only does it mean we may get a bind failure because something is already listening, validatePluginConfig actually calls PluginLoader.Dispense directly (not InstanceManager.dispense()) and so it creates a SEPARATE process from the one that we care about. If we are doing something resource intensive we definitely dont want to do it just for an ephemeral validatePluginConfig check

Do it the first time we receive Fingerprint()

Fingerprint isnt called before RecoverTask (is this another bug?)

Crazy workaround that did work

Do it the first time we receive Capabilities() which is called only on the final plugin process before StartTask or RecoverTask in Nomad's initDriver code.

schmichael commented 3 years ago

Oof that's an ugly situation. Great discovery work. tl;dr -> I'm +1 on this new API. PRs welcome, but we'll also accept this into our backlog.

Do it the first time we receive Fingerprint()

Fingerprint isnt called before RecoverTask (is this another bug?)

You're mostly right: fingerprinting and task recovery are done concurrently, so there's no guarantee Fingerpint will get called before RecoverTask.

Proposal

Add a method to drivers.DriverPlugin that is called only once per plugin instance, after SetConfig has been called, and only once Nomad has determined that it will use the driver.

Sadly the concurrency mentioned above is just the beginning of the complex path plugins take to be initialized as I think you've discovered. With that in mind, we have to be careful where we call a new Initialize API to ensure it's called exactly-once per "runtime" plugin invocation (as opposed to the "validation" plugin invocation you also ran into).

I think calling it directly after SetConfig in PluginLoader's Dispense method would work? https://github.com/hashicorp/nomad/blob/v1.1.2/helper/pluginutils/loader/loader.go#L183-L187

Are there any useful parameters for the Initialize API or is it analogous to Go's init() funcs? We can always add parameters later in a backward compatible way, so it's not a big concern.

schmichael commented 3 years ago

Doesn't work because nomad calls SetConfig twice

I originally assumed this was necessary because of how we load plugins, but I'm not sure that's true! I'm not sure why we kill the plugin after the first SetConfig: https://github.com/hashicorp/nomad/blob/v1.1.2/helper/pluginutils/loader/init.go#L480-L485

If that's unnecessary then perhaps we could solve this by making SetConfig only get called once?

benbuzbee commented 3 years ago

Ensuring SetConfig is only called once per process lifetime would be a reasonable fix yeah

would need to make sure if it crashes and re-dispenses that setconfig is again called and again only once