Get requests should go via a raft log too

jerrinot commented 2 years ago

Hi, this looks like a fun project. I am really not a golang dev, but I did a quick peek and here are some quick observations:

A follower might be behind a leader so when you do a get() on a follower you might get a stale value -> this breaks linearaziblity -> the system is no longer CP (in terms of CAP) -> You just violated the CP promise from readme :cry:

So a naive solution could be to always ask a leader, right? The problem is that a leader might be partitioned away, a new leader is elected without the original leader knowing, the system will make progress, again, without the old leader knowing. So the old leader might give you a stale reply and this again breaks linearizability.

The simplest solution is to do GET requests via the full Raft consensus. This will avoid stale replies and maintains linearizability. :muscle: There is an obvious downside: The raft log gets populated just by doing read-only requests. Not great. There are some optimization techniques, I believe some of them are documented in the Raft dissertation. I hope I didn't misremember that!

I didn't check how the server additional/removal works, but it's also a very tricky area with potential for subtle bugs.

Please don't get discouraged by :point_up:, I'm glad to see projects in this area! Happy Rafting!

inelpandzic commented 2 years ago

Hello, first of all, much much thanks for the feedback. And no man, not discouraged, but on the contrary, I'm encouraged with this, love to solve these kinds of problems, that's why I'm doing this (and hope it will full time, not a side project 😄 ).

Yes, you are totally right, a get() can read stale data, I'm fully aware of it and wanted to be like that (for now at least). Didn't want to go for linearizability necessarily, but more like for sequential consistency (although I should have realized that with that I don't have a CP property). The reason is primarily my time constraint :) and also I wanted to have a bit lower latency without going for reads through Raft.

So a naive solution could be to always ask a leader, right?

No, definitely not, first for the reasons you described, but also I don't want for everything to go through the leader, it puts the burden on that node and lowers the latency (even though this can happen if a system using imdgo is a heavy write system)

I didn't check how the server additional/removal works

Here I simply fully rely on my underlining Hashicorp's Raft implementation. On a system startup all of the nodes have to be known, atm it's not supported adding a node to a running cluster. When you have a new node, you'll update your config and redeploy your instances, and that way a cluster would be bootstrapped with additional nodes. Not ideal, I know, but I'm starting this one small step at a time 😄

For now, I'm gonna update the readme to remove misleading CP property which I violate.

Again, thank you very much man for taking the time to check this out, much appreciated!!

jerrinot commented 2 years ago

Cool! A fixed membership is a good choice for an educational project! Otherwise, it gets hairy really quickly:)

Ad consistency guarantees: The problem with local-only reads is they are not even sequentially consistent. Local reads can yield indefinitely stale results. Consider this scenario:

Setup: 3 nodes: n0 (leader), n1 (follower), n2 (follower) Initial State: x = 0

Execution on n2: set(x, 1) //applied on n0 and n1 print(get(x)) //local read on n2

With local-only get() this execution will print 0. Read-Your-Writes is violated and thus sequential consistency is violated too.

This is OK for many applications and performance benefits are obvious. So some systems allow configurable guarantees. Hazelcast internally uses 3 different modes for read-only requests, see: https://github.com/hazelcast/hazelcast/blob/14ef0dbefa4a6fa00a50a8873aeea02ec6c3f5e4/hazelcast/src/main/java/com/hazelcast/cp/internal/raft/QueryPolicy.java#L23-L54

inelpandzic commented 2 years ago

Yes, yes, I understand, you're totally right!! Thx, this is the ticket on top of my backlog :)

Can I ask just this, with this setup, what is the formal consistency model here (I admit, consistency models are still a bit tricky for me)?

jerrinot commented 2 years ago

That's a good question. Read-Your-Writes is a fairly basic property, that's what users implicitly expect and many consistency models are stronger than that. This is a good intro: https://jepsen.io/consistency (the picture is clickable)

inelpandzic commented 2 years ago

Yep, Jepsen is my reference on this 😄

Read-Your-Writes is a fairly basic property

100%, I'm gonna ensure that for starters, and the rest will stay the same for now. If I got everything right, with ensuring Read-Your-Writes, this could be sequentially consistent: causality is ensured by Raft and according to Jepsen "PRAM is exactly equivalent to read-your-writes", and we already have that (that is, will have that).

Excellent, this is getting more and more interesting 😄

Thx man a lot!!

jerrinot commented 1 month ago

👀

inelpandzic / imdgo

Get requests should go via a raft log too #1