moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.36k stars 615 forks source link

Task Desired State #348

Closed aluzzardi closed 8 years ago

aluzzardi commented 8 years ago

In order to support task lifecycle and various other features, we need to provide a desired state for tasks (#284).

Currently, when a node goes down, we simply remove the task and let the orchestrator add a new one. However, under the new assumption that we're not going to delete tasks anymore but instead keep them around, we should instead set the desired state of old tasks to LOST and create new ones. When the failed node comes back online, it will get an update from the dispatcher and will stop the tasks.

The same applies to rolling updates: rather than deleting the old version, the orchestrator should set the state to KILL (or similar). A huge side advantage is the orchestrator will now be able to wait for those tasks to actually stop before creating the new ones, avoiding running duplicate tasks during an update.

This raises the question: do we still need the drainer or should it be the orchestrator's job? Since it has to watch for nodes coming and going anyway (global jobs - #286), why not handle the draining as well. A node going down would not be anymore a special case - it's simply a node changing state which triggers the orchestrator.

Having a desired state would also allow to have something like swarmctl service {stop,kill,restart}.

/cc @aaronlehmann @dongluochen @stevvooe

stevvooe commented 8 years ago

We should follow the rule that, task state should only be set by an agent observation.

This means that if a node goes down, we don't mess with a task's state. We have to assume it is in that state, since we have no observation disagreeing with the state. If we want to record a condition (such as "lost"), we should record such a condition in the task elsewhere. For example, if a node goes down, we need to record the condition that those tasks are likely not running, but we cannot confirm that until the node has returned or the node has been decommissioned.

I agree that draining should be handled by the orchestrator.

aaronlehmann commented 8 years ago

How about a top-level field in the Task object called condition?

enum TaskCondition {
    OK = 0;
    LOST = 1;
    STOPPING = 2;
}
TaskCondition condition = 9;
wfarner commented 8 years ago

For example, if a node goes down, we need to record the condition that those tasks are likely not running, but we cannot confirm that until the node has returned or the node has been decommissioned.

Does this mean a human needs to intervene to induce rescheduling of tasks on a host that has lost network connectivity?

stevvooe commented 8 years ago

@aaronlehmann I think such a field could be added, as long as we agree on the responsible party for that field. What we want to avoid is having a field set by multiple different sources that may not be in agreement. We also need to consider whether or not desired state is sufficient.

@wfarner No. The node heartbeat still causes the task to be considered unsatisfied. We still reschedule the tasks.

We take the calculated risk that these tasks are down based on other observations and reschedule them. We mitigate this risk by holding onto information about the observation of a tasks given state and using that information to correct our assumption.

But, we cannot assume that the "orphaned" tasks are appropriately shutdown until the user tells us that that node will never come back or that node rejoins and we can instruct it to remove those tasks.

Essentially, we can infer the condition of task because we have heuristics to do so. We cannot infer the state, as that is unobservable, possibly indefinitely. We want to separate these, as they may be disagree.