Behaviour on setting TrailingLogs to 0

fahadullah commented 8 years ago

https://gist.github.com/fahadullah/e49a4578094f4efd7746

If TrailingLogs is set to 0 and raft node is restarted right after taking a snapshot, it tries to look up the last index but all entries were already compacted. TrailingLogs being 0 should be a valid config though a little inefficient under certain cases.

Lz-Gustavo commented 5 years ago

Is this still an observed issue on the current version?

Running the same test routine pointed by @fahadullah with some small modifications to adequate to the new API, I've reached the same "log not found" state. At first, I thought there could be a bug on InmemStore DeleteRange(), which is the implementation of LogStore interface used on unit tests. After some time, I've came back to the test definition and I think it's equivocated.

Please correct me if I'm wrong, but the expected log index after a successfully snapshot installation, and entire log truncation, shouldn't be zero? Not 102 as was currently defined.

lalalalatt commented 3 months ago

I encountered same issue on my own implementation. In the current hashicorp/raft implementation, "log not found" may occur under unreliable network. So I make some modifications in the appendEntries RPC handler. Original: https://github.com/hashicorp/raft/blob/185ae2ea91f09c04428c4293641db142b130a631/raft.go#L1476-L1502

After:

if args.PrevLogIndex > 0 {
    lastIdx, lastTerm := rf.getLastEntry()
    lastSnapshotIdx, lastSnapshotTerm := rf.getSnapshot()

    var prevLogTerm uint64
    if args.PrevLogIndex == lastIdx {
        prevLogTerm = lastTerm

        if args.PrevLogTerm != prevLogTerm && args.PrevLogIndex == lastSnapshotIdx && args.PrevLogTerm == lastSnapshotTerm {
            // We only compact the log range from `firstLogIndex` to `Min(lastLogIndex, lastSnapshotIndex - trailingLogs)`.
            // If `trailLogs` is not 0, the log entry on `lastSnapshotIndex` is not included in the log range to be compacted.
            // And sometimes the log entry on `lastSnapshotIndex` is the last log entry, but with a different term.
            // So we do this additional check
            prevLogTerm = lastSnapshotTerm
        }

    } else if args.PrevLogIndex == lastSnapshotIdx && args.PrevLogTerm == lastSnapshotTerm {
        // If there are remaining logs after the snapshot index, the `lastIdx` will be the last log index.
        // Without this additional check, even if the log entry on lastSnapshotIdx == args.PrevLogIndex exists semantically;
        // However, the log entry on index `lastSnapshotIdx` has been compacted, so we need to use the snapshot term as the term of compacted log on index `lastSnapshotIdx`.
        prevLogTerm = lastSnapshotTerm
    } else {
        var prevLog LogEntry
        if err := rf.logs.GetLog(args.PrevLogIndex, &prevLog); err != nil {
            rf.logger.Debug("failed to get previous log",
                zap.Int("receiver-id", rf.me),
                zap.Uint64("previous-index", args.PrevLogIndex),
                zap.Uint64("last-index", lastIdx),
                zap.Error(err))
            reply.NoRetryBackoff = true
            return
        }
        prevLogTerm = prevLog.Term
    }

    if args.PrevLogTerm != prevLogTerm {
        rf.logger.Debug("previous log term mis-match",
            zap.Int("sender-id", int(args.LeaderId)),
            zap.Int("receiver-id", rf.me),
            zap.Uint64("ours", prevLogTerm),
            zap.Uint64("remote", args.PrevLogTerm))
        reply.NoRetryBackoff = true

        // Find the last log that matches the term
        commitIdx := rf.getCommitIndex()
        conflictTerm := prevLogTerm
        var idx uint64
        for idx = args.PrevLogIndex - 1; idx >= commitIdx; idx-- {
            var log LogEntry
            if err := rf.logs.GetLog(idx, &log); err != nil {
                rf.logger.Debug("failed to get log",
                    zap.Int("receiver-id", rf.me),
                    zap.Uint64("index", idx),
                    zap.Error(err),
                )
                return
            }
            if log.Term != conflictTerm {
                break
            }
        }

        reply.LastLog = idx

        return
    }
}

lalalalatt commented 3 months ago

@slackpad Sorry for disturbing you, I guess you are the maintainer of this repo. Would like to hear your thoughts on this issue, Thx!

hashicorp / raft

Behaviour on setting TrailingLogs to 0 #86