redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.21k stars 564 forks source link

Broker becomes leader after restart before STM replay has finished #13614

Open StephanDollberg opened 10 months ago

StephanDollberg commented 10 months ago

Version & Environment

Redpanda version: dev

What went wrong?

When restarting Redpanda with lots of unflushed partitions (because of running with acks=1 or no existing snapshot) it looks like RP becomes leader before actually finishing STM replay for all partitions.

This is problematic as the replay can put strain on the system while at the same time the broker is already serving clients again. This might overload the broker (cpu or disk) and then cause instabilities (leadership ping-pong).

What should have happened instead?

Only become leader once in fully recovered state again.

How to reproduce the issue?

Additional information

Below are some metrics from the scenario occuring. The gap is the node being shutdown. After restart we see the node very quickly gaining leadership again while there is still ongoing disk reads for about 20 minutes. During that period the cluster is showing instability and constant leadership changes. Once reads have finished the cluster becomes stable again.

image image image

JIRA Link: CORE-1455

StephanDollberg commented 9 months ago

This is caused by the following bit of code: