nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16.03k stars 1.41k forks source link

Fix desync after errCatchupAbortedNoLeader #5986

Closed MauriceVanVeen closed 1 month ago

MauriceVanVeen commented 1 month ago

Previously a related case of RAFT state being deleted was fixed, when running into errCatchupTooManyRetries: https://github.com/nats-io/nats-server/pull/5939

After hitting this we shutdown and retry.. but if we have not elected a leader yet we'd hit "catchup for stream '%s > %s' aborted, no leader", which then would again throw away RAFT state. This PR proposes a fix for that case.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

MauriceVanVeen commented 1 month ago

Is this a fair summary of the change?

When aborting catchup due to leader not present, do not wipe replica state

Yes :slightly_smiling_face:

derekcollison commented 1 month ago

LMK when this is good to go for review.

MauriceVanVeen commented 1 month ago

LMK when this is good to go for review.

Awaiting CI to be green, but otherwise good for review