lerna-stack / akka-entity-replication

Akka extension for fast recovery from failure with replicating stateful entity on multiple nodes in Cluster
Apache License 2.0
30 stars 1 forks source link

Leader continued replying with ReplicationFailed #170

Closed xirc closed 2 years ago

xirc commented 2 years ago

Situation

The following log continued (up to 1700) in some fault injection tests:

[Leader] failed to replicate the event (type=[lerna.akka.entityreplication.raft.model.NoOp$]) since the entity (entityId=[0000059981], instanceId=[34156], lastAppliedIndex=[814]) must apply [1] entries to itself. The leader will replicate a new event after the entity applies these [1] non-applied entries to itself.

By diagnosing logs, the following situation happened:

  1. RaftActor (replica-group-1) was the leader.
  2. Entity (id=0000059981, replica-group-1) succeeded in replication of NoOp.
    • The entity's lastAppliedLogEntryIndex was 814.
    • The NoOp replication was succeeded with index 821.
  3. RaftActor (replica-group-2, Follower) updated indices to 821 (commitIndex=821, lastApplied=821).
    • RaftActor (replica-group-2, Follower) didn't send Replica for index 821 to the entity since an associated event is NoOp.
    • Entity (id=0000059981, replica-group-2) didn't update its lastAppliedLogEntryIndex to 821.
  4. RaftActor (replica-group-2) became the leader for some reasons.
  5. Entity (id=0000059981, replica-group-2) received ProcessCommand and then attempted to replicate an event:
    • Entity (id=0000059981, replica-group-2) sent Replicate(entityLastAppliedIndex=814, ...)
  6. RaftActor (replica-group-2, Leader) replied with ReplicationFaield

Replica for EntityEvent(Some(entityId), NoOp) is not sent: https://github.com/lerna-stack/akka-entity-replication/blob/0bd7465baeec365ae5f1e294d37b9665f6f7a901/src/main/scala/lerna/akka/entityreplication/raft/RaftActor.scala#L477-L486

Possible solutions

  1. RaftActor will send Replica to an entity also if an EntityEvent contains NoOp.
  2. Leader will start replication if non-applied entries contain only NoOp.