When restarting Redpanda with lots of unflushed partitions (because of running with acks=1 or no existing snapshot) it looks like RP becomes leader before actually finishing STM replay for all partitions.
This is problematic as the replay can put strain on the system while at the same time the broker is already serving clients again. This might overload the broker (cpu or disk) and then cause instabilities (leadership ping-pong).
What should have happened instead?
Only become leader once in fully recovered state again.
How to reproduce the issue?
Use a weak disk (for example EBS with limited IOPS & throughput)
Create a big unflushed log
Restart RP
Additional information
Below are some metrics from the scenario occuring. The gap is the node being shutdown. After restart we see the node very quickly gaining leadership again while there is still ongoing disk reads for about 20 minutes. During that period the cluster is showing instability and constant leadership changes. Once reads have finished the cluster becomes stable again.
Version & Environment
Redpanda version: dev
What went wrong?
When restarting Redpanda with lots of unflushed partitions (because of running with acks=1 or no existing snapshot) it looks like RP becomes leader before actually finishing STM replay for all partitions.
This is problematic as the replay can put strain on the system while at the same time the broker is already serving clients again. This might overload the broker (cpu or disk) and then cause instabilities (leadership ping-pong).
What should have happened instead?
Only become leader once in fully recovered state again.
How to reproduce the issue?
Additional information
Below are some metrics from the scenario occuring. The gap is the node being shutdown. After restart we see the node very quickly gaining leadership again while there is still ongoing disk reads for about 20 minutes. During that period the cluster is showing instability and constant leadership changes. Once reads have finished the cluster becomes stable again.
JIRA Link: CORE-1455