Open jcsp opened 7 months ago
Encountered an s3 recovery related problem in #7927: if we just use the "flush more often" somehow in solving this issue (like it behaves when checkpoint_distance
is smaller than initdb size) we will produce 2 index_part.json updates very near one and the other. This means that s3_recovery will not work, and the test case hangs as it's waiting for the WAL part of initdb to arrive for the root timeline.
This failure mode was obscured by a number of things, but mock_s3 and real_s3 both exhibit this behaviour together with stable sort.
It of course only applies to timelines which have never had a compute started up against them. However, the first uploaded index_part.json
version is meaningless and inconsistent: we can never recover using safekeepers to that Lsn because pageserver is the only one who had the WAL (and uploaded as initdb.tar.zst).
For importing really large backups, I don't think we can use the normal flush loop at all, we will need to build the image layers directly somehow.. I don't know how to do it in a streaming fashion, because we'd essentially need random access I/O to the whole fullbackup tar to do the repartitioning and splitting into image layers. An okay workaround might be to create arbitrary image layers before the imported lsn so that we can fit the fullbackup and produce "L0 deltas" (which are actually image layers, but this way they'll get to go through the compaction treatment).
Background
See: https://github.com/neondatabase/neon/pull/7182#issuecomment-2012100802
In
flush_frozen_layer
we do this:This code path isn't taken for normal timeline creations, because although we call freeze_and_flush right after creation, there is a small WAL ingest between ingesting initdb and freezing the layer.
It's mostly harmless to skip this image layer generation, because an L1 layer full of page values is not any less efficient than an image layer full of values. However, if implement compression of image layers (#5913 ) before we attempt compression of image values in delta layers, there's a benefit to writing an image layer for newly created tenants, to reduce the physical size.
Action
We should do one of these two things: