Open jcsp opened 6 months ago
For posterity, Konstantin wrote a concise summary of the edge case that John mentions in the issue description (ake "staircase pattern")
Sorry, it is not so easy for me to interpret this picture. But at first glance it seems to be classical "stairs problem". Just wan to remember: what "stairs problem" mean:
- GC is able to remove layer if it is fully covered by image layers.
- Image layer is generated if there are at least 3 (or 6?) delta layers between it and underlying image layer
- Boundaries of L1 layers are completely flexible - it depends only on physical layers size.
So it can happen the start position of each new generated L1 layers is shifted a little bit compatring with position of previous L1 layer. It can naturally happen if we just append data to som table, so that changed pages are at the end of relation. Such stair can have arbitrary height and never be fully covered by image layers. This is what my "gc-feedback" mechanism tries to address. But it was never tested on reall projects and now it is just removed (because not used).
Once we have image layer compression, we might decide that we want to unconditionally replace deltas with image layers on some time cadence (e.g. PITR interval) in order to benefit from compression. That might simplify this ticket.
I'm not sure if replacing all delta layers is the best idea as we want to preserve the CoW property of branching, but of course we can't hold onto delta layers unconditionally outside of the PITR interval.
Background
The gc_feedback mechanism removed in https://github.com/neondatabase/neon/pull/6863 is meant to protect against edge cases where repeated keyspace repartitioning can result in stacks of deltas that are never fully covered by image layers, and therefore never get GC'd.
The history as I understand it is:
gc_feedback
tenant config to turn feedback off, and it has been off by default since then.Purpose
This ticket tracks creating an improved mechanism to ensure that:
The previous gc_feedback mechanism was not widely used because it satisfied 1 & 2 but not 3 & 4.
A replacement mechanism might not need to involve the GC code -- we can directly query the layer map during compaction and: