sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.57k stars 1.14k forks source link

When applyTask, the leader copies the log to half of the node and hangs it. How does the subsequent new leader fail to commit the logcommit? #275

Closed readless closed 4 years ago

readless commented 4 years ago

如题~

leader接收到写日志请求后,将日志分发给半数以上节点后向客户端返回成功,然后提交这个log,其他节点在还没收到leader的commit前leader挂了,后面新leader出来后,这个log是在哪里处理commit的~

sofastack-bot[bot] commented 4 years ago

Hi @readless, we detect non-English characters in the issue. This comment is an auto translation by @sofastack-robot to help other users to understand this issue.

We encourage you to describe your issue in English which is more friendly to other users.

If the leader receives the log request, the log is distributed to more than half of the nodes and returns to the client. Then the log is submitted. The other nodes hang before the commit of the leader has been received. After the new leader comes out, this Where is the log processing the commit~

killme2008 commented 4 years ago

@readless 后续新 leader 会继续发送心跳和 append entires 请求,会“捎带”上 last commit index, follower 根据这个 commit index 做提交。

readless commented 4 years ago

我的意思是A节点 leader挂了后,B节点成为leader,此时B节点有一个log,是未提交的,这时如果有读请求过来,这个log又是未提交的,就没法实现一致读。 所以是什么机制使得B节点能提交此log @killme2008

killme2008 commented 4 years ago

@readless 你说的问题是不存在的,因为 raft 的选举机制就保证选出来的 leader 一定是拥有最全日志的节点(之一)。

readless commented 4 years ago

@killme2008 A节点将logX分发给半数以上节点后,A向客户端返回成功,后来A挂了,然后B成为leader,此时B确实有logX, 也是最全的,但是B的logX还没提交。 我的问题是: 有什么机制使得B节点logX能提交。

如果B节点的logX没提交,这时有读请求进来,logX就没法反映到状态机里

fengjiachun commented 4 years ago

补充一下, 线性一致读要求 appliedIndex 超过 readIndex,可以参考一下代码的具体实现

killme2008 commented 4 years ago

@readless 我明白你的意思了,你纠结的是说这个 logX 什么时候被 commit,我先纠正下, commit 也不表示已经 apply 到状态机,这里还有一个 applyIndex 的概念。其次,每个新 leader 都会写入一条 configuration log,这样就可以将之前的日志一起 commit ,这个步骤很关键,线性一致读需要判断当前 term 至少写入了一条日志。

fengjiachun commented 4 years ago

https://www.sofastack.tech/projects/sofa-jraft/consistency-raft-jraft/

JRaft 实现细节解析之高效的线性一致读 章节有详细介绍

@readless

killme2008 commented 4 years ago

@readless 补充下,你说的问题在 raft paper 第8节有讲到的

https://raft.github.io/raft.pdf

Read-only operations can be handled without writing
anything into the log. However, with no additional measures, this would run the risk of returning stale data, since
the leader responding to the request might have been superseded by a newer leader of which it is unaware. Linearizable reads must not return stale data, and Raft needs
two extra precautions to guarantee this without using the
log. First, a leader must have the latest information on
which entries are committed. The Leader Completeness
Property guarantees that a leader has all committed entries, but at the start of its term, it may not know which
those are. To find out, it needs to commit an entry from
its term. Raft handles this by having each leader commit a blank no-op entry into the log at the start of its
term. Second, a leader must check whether it has been deposed before processing a read-only request (its information may be stale if a more recent leader has been elected).
Raft handles this by having the leader exchange heartbeat messages with a majority of the cluster before responding to read-only requests. Alternatively, the leader
could rely on the heartbeat mechanism to provide a form
of lease [9], but this would rely on timing for safety (it
assumes bounded clock skew).

jraft 也是处理了这个 corner case 的,具体见

https://github.com/sofastack/sofa-jraft/blob/master/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L999

https://github.com/sofastack/sofa-jraft/blob/master/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L1278,L1285

有兴趣自行研读代码。

readless commented 4 years ago

@killme2008 明白了~感谢~ 网上很多文章没提到会添加一个log来提交之前的commit~

readless commented 4 years ago

https://www.sofastack.tech/projects/sofa-jraft/raft-introduction/

如果存在一个N > commitIndex和半数以上的matchIndex[i] >= N并且log[N].term == currentTerm, 将commitIndex赋值为N

所以现在明白这句话了~