xnnyygn / xraft

xnnyygn's raft implementation
MIT License
229 stars 108 forks source link

java.util.ConcurrentModificationException in `List<Entry> entries` of `MemoryEntrySequence` #40

Closed meteam2022 closed 10 months ago

meteam2022 commented 10 months ago

When the network latency is high, a kvstore-set operation may be inserted between sending the AppendEntriesRpc msg and receiving the AppendEntriesResultMessage. The entries in the AppendEntriesRpc is a view of entrySequence. Upon receiving the AppendEntriesResultMessage, the entries in AppendEntriesRpc has been concurrently modified, triggering java.util.ConcurrentModificationException.

The kvstore-set operation results in appending a log entry to the entrySequence:

https://github.com/xnnyygn/xraft/blob/3fab4b12bb2ff86939011127feeaa86e688de7a9/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/AbstractLog.java#L144-L148

Processing AppendEntriesResultMessage involves reading the entries that have been concurrently changed:

https://github.com/xnnyygn/xraft/blob/c63dd0011497faa576057119b4455fe9e401dcef/xraft-core/src/main/java/in/xnnyygn/xraft/core/rpc/message/AppendEntriesRpc.java#L76-L78

Log (note that the timestamps are fake):

2022-06-04 13:36:54.100 [node] INFO  node.NodeImpl - become leader, term 1
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - node n1, role state changed -> LeaderNodeRole{term=1, logReplicationTask=LogReplicationTask{delay=999}}
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - receive request vote result and current role is not candidate, ignore
** 2022-06-04 13:36:54.101 [nioEventLoopGroup-5-1] DEBUG server.Service - set x **
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n2 is replicating, skip replication task
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n3 is replicating, skip replication task
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1415) ~[?:?]
        at java.util.ArrayList$SubList.size(ArrayList.java:1155) ~[?:?]
        at java.util.AbstractCollection.isEmpty(AbstractCollection.java:91) ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex(AppendEntriesRpc.java:77) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult(NodeImpl.java:694) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6(NodeImpl.java:646) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.lang.Thread.run(Thread.java:1589) [?:?]
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1415) ~[?:?]
        at java.util.ArrayList$SubList.size(ArrayList.java:1155) ~[?:?]
        at java.util.AbstractCollection.isEmpty(AbstractCollection.java:91) ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex(AppendEntriesRpc.java:77) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult(NodeImpl.java:694) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6(NodeImpl.java:646) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.lang.Thread.run(Thread.java:1589) [?:?]
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server

A possible fix

Create a copy of the subList into a new List:

diff --git a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
index 9d48010..4c2293a 100644
--- a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
+++ b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
@@ -22,7 +22,7 @@ public class MemoryEntrySequence extends AbstractEntrySequence {

     @Override
     protected List<Entry> doSubList(int fromIndex, int toIndex) {
-        return entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset);
+        return new ArrayList<>(entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset));
     }

     @Override
xnnyygn commented 10 months ago

Thanks for pointing it out. I feel the main reason is we don't know the sent AppendEntriesRpc message, so we have to hold it somewhere when we receive the result from followers. The entries sent should be a readonly view, not the original entries object. Or we need to consider some way to prevent reading the same entries object from 2 threads. I think your proposal should work. Would you mind creating a pull request for it?