hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.76k stars 1.94k forks source link

Remote Task Drivers and Resources #10549

Open schmichael opened 3 years ago

schmichael commented 3 years ago

Remote Task Drivers were released in Nomad 1.1.0 with no special consideration for resources: resources for remote tasks are treated the same as local tasks. This effectively makes the resource stanza useless other than as a minimal value to prevent overpacking a given client. The minimum resources should be given to all remote tasks and each remote task driver should implement their own resource stanza in their config block.

Resources Mismatch

Part of the reason this was left unaddressed for Nomad 1.1.0 was because it may not make sense for remote task drivers to use the same dimensions and units as local tasks. AWS Lambda for example automatically allocates CPU proportional to the amount of memory requested. Nomad's separate cpu and memory stanzas would not be useful for a Lambda driver.

For this reason resources were left up to remote task drivers to independently implement.

Solution 1: Scheduler-aware Remote Task Drivers

Nomad could build a task driver plugin catalog by putting driver capabilities in node fingerprints. This would implicitly require forcing all driver capabilities to be static for a given version of the plugin and cataloging on a per version basis.

Once the scheduler had a catalog of driver fingerprints we could special-case resource-related scheduling logic (eg binpacking) for remote task drivers.

Pros

Least surprising solution. This is likely how users expect remote task drivers to work. It is intuitive and straightforward to explain.

Cons

  1. Requires implementing another mechanism to prevent overpacking nodes with remote tasks as each task will take some local node resources to manage.
  2. Extra state storage on servers (although minimal)
  3. Nomad's resource stanza may not be what the remote task driver's runtime expects. See Resources Mismatch above.

Solution 2: Custom Resources

Custom Resources (#1081) would allow Nomad nodes with remote task drivers to specify capacities for their remote resources. The scheduler would be generically aware of these custom resources and be able to use them in conjunction with the standard resources stanza when scheduling.

Pros

  1. Flexibility to match task driver remote resources
  2. Quotas could be implemented on custom resource types
  3. Overpacking still naturally prevented by the existing resources stanza

Cons

  1. Custom resources as proposed are still modeled as node local resources, so there could still be impedance mismatch between remote runtime resource pools and a local agent's custom resource specification.
  2. Management could be tricky as agent configs would have to match the resources expected by plugins. If plugins changed expected resources between versions upgrading could be quite difficult.

Adding custom resources to the scheduler's cluster data model may offer ways of alleviating these issues by offering centralized configuration and validation of custom resources.

Non-Solution (Status Quo): Leave it up to Remote Task Drivers

The status quo is that the existing resources stanza continues to be used to ensure a node managing remote tasks is not overburdened, but any resources specific to the remote task runtime must be configured in the task driver's config block.

For example AWS Lambda only allows configuring the memory allocated to a lambda, therefore the config block for a theoretical Lambda remote task driver could be:

task "somefunc" {
  driver = "lambda"
  config {
    memory = 1024 # See https://aws.amazon.com/lambda/pricing/ for options
      # ... other lambda-specific parameters here ...
  }

  # Resources to reserve on the Nomad client agent to ensure it does not become over-packed
  resources {
    cpu    = 50
    memory = 50
  }
}
josegonzalez commented 2 years ago

For either of these solutions, what does this look like from a user perspective? Is there a difference for what the user would configure in their jobs, or what platform operators would configure for plugins?

schmichael commented 2 years ago

@josegonzalez Good question. I added a section to the bottom of the issue to demonstrate the status quo: driver specific resources. It's awkward but not untenable.

Solution 1 would have no jobspec changes! resources would get treated differently for remote task drivers! Cluster operators would likely configure the max number of remote resources any given Nomad client should manage (for example with Lambda you might specify "total_memory = 102400" in the Lambda plugin's stanza to specify 100 1gb lambda funcs could be managed by that client... although that's a weird way to ask operators to configure it because the Nomad client could end up managing 200 lambdas or 10 lambdas! Since the overhead of a remote task is basically constant on the Nomad client this approach would be awkward for cluster operators to manage.

Solution 2: Custom Resources would be great to have in general. The original issue #1081 has some great examples. However, even if we implemented custom resources, I'm not sure it would make sense for them to be used for remote tasks. The Nomad scheduler just does not care what resources are being used in the remote runtime. The only benefits are maintaining the resources UX and quotas.

So the status quo seems ok to me at the moment. At least until we get some more remote task drivers written (and fix #10592), I'm not sure we can choose the right tradeoffs here.