Seems that readIndex() may return inconsistent value (e.g. 0) after node restart, is this a known issue?

sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.

https://www.sofastack.tech/projects/sofa-jraft/

Apache License 2.0

3.58k stars 1.15k forks source link

Seems that readIndex() may return inconsistent value (e.g. 0) after node restart, is this a known issue? #1049

Closed sanpwc closed 3 months ago

sanpwc commented 9 months ago

Within node restart lastCommittedIndex initialized as 0

public class BallotBox implements Lifecycle<BallotBoxOptions>, Describer {
    ...
    private long lastCommittedIndex = 0;
    ...
}

Taking into the consideration that local log application triggered by NodeImpl#init is asynchronous, it's possible to handle readIndex request before local log re-play is finished and thus see inconsistent readIndex value. In other words, seems that there's a race between readIndex evaluation and lastCommittedIndex restoration on node restart.

killme2008 commented 9 months ago

I think it's impossible, the handleReadIndexRequest will check the state of the node before reading the lastCommittedIndex, and if its state is not the leader or the current leader doesn't commit an entry before, the readIndex will fail.

https://github.com/sofastack/sofa-jraft/blob/2ea82d6d0605af737af59c853a3c00e643b0384e/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L1511

https://github.com/sofastack/sofa-jraft/blob/2ea82d6d0605af737af59c853a3c00e643b0384e/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L1566

And when a node becomes a leader, it will update the lastCommittedIndex before releasing the write lock:

https://github.com/sofastack/sofa-jraft/blob/2ea82d6d0605af737af59c853a3c00e643b0384e/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L1264

https://github.com/sofastack/sofa-jraft/blob/2ea82d6d0605af737af59c853a3c00e643b0384e/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L502

sanpwc commented 9 months ago

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

killme2008 commented 9 months ago

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

If so, that would be possible, good catch! But I really doubt if is this a common case(just one node) for the production usage of jraft.

We can fix it, but maybe it is not urgent.

sanpwc commented 9 months ago

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries. So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.

    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }

What do you think?

killme2008 commented 9 months ago

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries. So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.
    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }
What do you think?

It seems like the fix will work as expected. Would you be able to create a pull request for this change?

sanpwc commented 9 months ago

Sure