redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.59k stars 584 forks source link

storage: adjacent segment merging for non-compacted topics #6644

Open jcsp opened 2 years ago

jcsp commented 2 years ago

What

Even if a topic doesn't want compaction of its data, we should still consider merging adjacent segments, including dropping configuration batches where possible.

Why

This is especially relevant in unhealthy clusters, where repeated raft elections may lead to large numbers of tiny segments, with associated costs in memory & file handles, with extreme cases such as crash loops leading to outages.

How

We should do adjacent segment compaction when:

The main thing to be careful of is raft terms: when we merge segments we may discard the knowledge of what term particular batches were in: this is only safe if we're sure all raft peers are up to date and will not need to replay from further back. This is a hand-waving explanation: a more rigorous think-through of the raft consequences will be needed in https://github.com/redpanda-data/redpanda/issues/6432 when implementing cross-term merges &

As well as doing adjacent segment merging as a background housekeeping task, we should sometimes do it at startup: if a topic has several single-batch segments at the tip of the log on startup, we should merge them before starting raft if it is safe to do so, so that if we are in some kind of restart loop, we do not create new files indefinitely.

Related tickets:

JIRA Link: CORE-1039

jcsp commented 2 years ago

@mmaslankaprv makes the excellent point that once we have the ability to put data from different terms in the same segment, we might use that to avoid opening new segments on restart at all, and instead pick up and continue appending to an existing segment.