Read-only mode when cluster is inconsistent

aluzzardi commented 8 years ago

Should we allow a read-only mode (cluster API side) when the cluster is "inconsistent" (as in, leader election still needs to happen)?

Basically right now we reject everything, but perhaps we could let every manager serve read queries locally until a master gets elected.

/cc @abronan @aaronlehmann @LK4D4

abronan commented 8 years ago

TL;DR It depends on what you do with your read queries, if this is only for CLI interactions and listing the nodes/services/tasks, etc. and does not generally involve taking actions based on those reads, then yes we can do that (even though the benefit is pretty limited). Otherwise we shouldn't or you'll have to correct faulty actions that are a direct consequence of those reads.

At the risk of repeating myself but I assume I misunderstood the original intent of the issue:

In the case you need to take an action on an Agent or any part of the system based on that read (reconciliation loop for example, or whatever related to taking application level decisions in a timely fashion), then we shouldn't, or we defeat the purpose of using raft in the first place. The fact that a leader is not elected can mean that a node finds itself inside of a partition and thus can lag behind committed log entries on another partition with a majority and a leader elected. So you are going to serve a read from that node, which may result in an action that will not reflect the new events on the other side of the partition with new committed log entries. At the end when the partition resolves, you will need to solve the conflict on your internal data structures or correct the damaging action you took from the partition that served a non consistent read (because the other side kept on going and updated that piece of data). In this case you're better off using CRDTs if you want to allow reads to be served in the absence of a leader while converging the state.

Raft is used for critical data that needs to follow a total order orchestrated by the Leader (CP), it does not necessarily have to serve consistent reads (consul/etcd can allow stale reads), but most application-level logic using paxos/raft are assuming that the reads are consistent (like we do). If you serve reads when the leader is down (or you think it is but you are isolated), then you assume that the data is non critical/or you can deal with conflicts (AP). This is why etcd or consul do not serve reads when the leader is down and throw a no elected leader error on Put/Get/CAS (if you take the consistency option).

An example of a partition resulting in an application-level conflict (the node inside the partition is the kayak and comes back in like: "hey everyone, what's up!?") 😄

raft-2

aluzzardi commented 8 years ago

Haha :) Nice illustrated example.

Yeah, that's what I meant by "cluster API side" - it's really just the CLI, the rest of the system (orchestrator/scheduler/etc) stays down anyway since it's only started when you actually get the leadership.

abronan commented 8 years ago

Yeah, that's what I meant by "cluster API side" - it's really just the CLI, the rest of the system (orchestrator/scheduler/etc) stays down anyway since it's only started when you actually get the leadership.

Fair enough. I'd say that we can in this case but this makes the user experience a little bit "inconsistent". Some actions on the cluster API will require a healthy raft (with leader), like removing a manager for example or submitting a new service/task in general. So you'll have access to some commands but not some others.

Also it is not only about swarm but about tools automating tasks on top of Swarm, if the Cluster API serves reads locally, what happens to people who will code gRPC based clients (and applications using those clients) in many languages to interact with Swarm? Worse if the requests are load balanced between managers (and the raft cluster is split in two but the client has access to all the machines). Some actions might just be rejected (best case), some might harm the system (task listed as down on one side but not on the other and the client takes a decision based on that, etc. worst case).

aluzzardi commented 8 years ago

I agree with you - no inconsistent reads.

Closing down.

moby / swarmkit

Read-only mode when cluster is inconsistent #404