hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.99k stars 1.96k forks source link

Nomad executes any file in `/plugins/` disregarding the plugin contract. #18529

Open stswidwinski opened 1 year ago

stswidwinski commented 1 year ago

Nomad version

tip

Operating system and Environment details

Unix

Issue

The advertised Nomad contract is seems to be:

The name of the plugin is the plugin's executable name relative to to the plugin_dir. If the plugin has a suffix, such as .exe, this should be omitted.

Consequently, if the plugins directory looks as follows:

\- plugins
  \- foo.exe
  \- bar.exe

And the config looks as follows:

[...]
  "plugin": {
    "foo": {...}.
    [...]
   }
[...]

I would expect foo.exe to be used as a plugin and bar.exe ignored since it is not mentioned within the configuration at all. However, it seems that both foo.exe and bar.exe will be executed and used for plugin configuration. Even worse, if bothbar.exe and foo.exe correspond to a driver whose fingerprinting endpoints return the same name and version, it is non-deterministic which one of the binaries will actually be used as the driver.

Root cause.

When Nomad starts up, it will load plugins and fingerprint them. The rough codepath is:

  1. Setup plugins: https://github.com/hashicorp/nomad/blob/main/command/agent/plugins.go#L15
  2. New plugin loader: https://github.com/hashicorp/nomad/blob/main/helper/pluginutils/loader/loader.go#L115
  3. Init plugin loader: https://github.com/hashicorp/nomad/blob/main/helper/pluginutils/loader/init.go#L57

Now, this has the following logic which goes against the contract described above:

  1. ls  the plugin directory: https://github.com/hashicorp/nomad/blob/main/helper/pluginutils/loader/init.go#L68
  2. execute  all files in it: https://github.com/hashicorp/nomad/blob/main/helper/pluginutils/loader/init.go#L322
  3. Use the response of the plugin to get the name of it: https://github.com/hashicorp/nomad/blob/main/helper/pluginutils/loader/init.go#L279-L302

So in other words: the name of the binary is only used to derive the config json for the driver, not to control whether or not the driver should be executed in the first place.

Reproduction steps

The simplest repro is to have an unrelated, non-executable file within plugins and observe that Nomad will eagerly execute it (or rather attempt and fail) despite the config not mentioning the file as a plugin.

The non-deterministic choice of the version of the config is also simple to reproduce. Take any functioning task driver and cause it to fail on first task start. Copy the task driver into the plugins/ directory 10 times and add a single working copy. Observe that the correct driver is used around 9% of the time.

schmichael commented 1 year ago

Unfortunately at this point I'm afraid the behavior needs to be treated as correct and the docs updated to reflect that.

The intent is that all plugins in the plugin directory are loaded by default regardless of explicit configuration. To change that to require explicit configuration now may break long existing Nomad clusters which rely on the autodetection mechanism.

Unprivileged write access to a plugin directory must be prevented. If you think this is insufficiently safe let me know. We could always warn users in upcoming releases and schedule the breaking change in the future. That's just a lot of effort when unprivileged write access to the plugin directory would still likely allow for attacks.

Non-determinism around which plugin is selected is a bug, but I'm unsure what the proper fix is. Lexicographically selecting the first one or crashing with an error both seem defensible options: the former assumes it doesn't matter and the latter assumes it was an operator mistake that needs correcting.

stswidwinski commented 1 year ago

The intent is that all plugins in the plugin directory are loaded by default regardless of explicit configuration. To change that to require explicit configuration now may break long existing Nomad clusters which rely on the autodetection mechanism.

What if we made this behavior configurable with the default reflecting the old behavior?

Unprivileged write access to a plugin directory must be prevented. If you think this is insufficiently safe let me know.

Agreed -- Having access to this directory already allows one to mimick a registered plugin. I think I am more concerned with the ease with which one may add non-determinism mentioned, but your point about maintaining an old observed behavior resonates with me