lerna-stack / akka-entity-replication

Akka extension for fast recovery from failure with replicating stateful entity on multiple nodes in Cluster
Apache License 2.0
30 stars 1 forks source link

`RaftMemberData.resolveNewLogEntries` got requirement failed: A new entry conflicted with a committed entry #165

Closed xirc closed 2 years ago

xirc commented 2 years ago

The following error occurred in some fault injection tests:

04:48:20.417 xxx        ERROR   akka.actor.OneForOneStrategy    xxx akka://xxx/system/sharding/raft-shard-xxx-replica-group-3/30/30       -       requirement failed: The entry with index [583] should not conflict with the committed entry (commitIndex [584]) java.lang.IllegalArgumentException: requirement failed: The entry with index [583] should not conflict with the committed entry (commitIndex [584])
    at scala.Predef$.require(Predef.scala:337)
    at lerna.akka.entityreplication.raft.FollowerData.resolveNewLogEntries(RaftMemberData.scala:147)
    at lerna.akka.entityreplication.raft.FollowerData.resolveNewLogEntries$(RaftMemberData.scala:119)
    at lerna.akka.entityreplication.raft.RaftMemberDataImpl.resolveNewLogEntries(RaftMemberData.scala:616)
    at lerna.akka.entityreplication.raft.Follower.lerna$akka$entityreplication$raft$Follower$$receiveAppendEntries(Follower.scala:111)
    at lerna.akka.entityreplication.raft.Follower$$anonfun$followerBehavior$1.applyOrElse(Follower.scala:25)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at lerna.akka.entityreplication.raft.RaftActor.akka$persistence$Eventsourced$$super$aroundReceive(RaftActor.scala:141)
    at akka.persistence.Eventsourced$$anon$4.stateReceive(Eventsourced.scala:923)
    at akka.persistence.Eventsourced.aroundReceive$$original(Eventsourced.scala:251)
    at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:148)
    at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:250)
    at lerna.akka.entityreplication.raft.RaftActor.aroundReceive(RaftActor.scala:141)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke$$original(ActorCell.scala:548)
    at akka.actor.ActorCell.invoke(ActorCell.scala:61)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run$$original(Mailbox.scala:231)
    at akka.dispatch.Mailbox.run(Mailbox.scala:32)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
xirc commented 2 years ago

From diagnosing logs of another test that also got the requirement failed error, the following situation could be a cause of this error:

RaftActor of replica-group-1

RaftActor of replica-group-2

RaftActor of replica-group-3

xirc commented 2 years ago

There might be at least two possible solutions:

  1. Improve a mechanism for decrementing the next index
  2. Improve a mechanism for advancing the commit index

There might be a reason behind the next index being lower than expected: