Epic: Pageserver Timeline Archival

jcsp commented 4 months ago

Purpose

Enable users to create branches fearlessly, without worrying about hitting branch count limits & without having to worry about cleaning up old branches unless they want to.

Background

Currently, all timelines have significant physical overhead on the pageserver, even if they haven't been used for days/weeks/months:

scanning timeline's remote storage path on tenant startup & load their index
pinning some of the timeline's layers into local storage for logical size calculations
running a wal receiver for the timeline

Changes

This section isn't an authoritative design, but calls out functional areas that will need work.

We'll need some manifest in remote storage that the tenant can read on startup to learn which timelines should be loaded in an active state, vs. which timelines are hibernated. Keeping this properly up to date with timeline create/delete operations will be a key correctness point.
Persist enough information about hibernated timelines that we can know their logical size (& any other key stats) without having to load them fully. It probably makes sense to inline this into the per-tenant object that lists the timelines.
Our runtime state in Tenant will need to only store active timelines in Tenant::timelines, and have some other map of hibernated timelines.
APIs that list timelines will need either to change their semantics to only report active timelines, to avoid unreasonably large responses when users have many thousands of branches -- or paginated/queryable. Bu
An external API to enable the control plane to tell us when a timeline should be hibernated or awoken. We could also choose to auto-hibernate after some period of inactivity, but that might be duplicative wrt the externally driven mechanism.`.
A cache-warming routine that loads enough layers to serve reads at the tip of the branch, so that when we activate a timeline, the user doesn't encounter a long slow period while data is promoted to local storage.

### Tasks
- [ ] https://github.com/neondatabase/neon/issues/8218
- [ ] https://github.com/neondatabase/neon/pull/8131
- [ ] https://github.com/neondatabase/neon/pull/8414
- [ ] https://github.com/neondatabase/neon/pull/8458
- [x] https://github.com/neondatabase/neon/issues/8459 / https://github.com/neondatabase/neon/pull/8479
- [x] pageserver: implement visible layer housekeeping, for use in warm-ups
- [ ] https://github.com/neondatabase/neon/pull/8824
- [x] controller: add pass-through for `archival_config` API: #8680
- [ ] https://github.com/neondatabase/neon/pull/9122
- [ ] https://github.com/neondatabase/neon/pull/8907
- [ ] #9289
- [ ] https://github.com/neondatabase/neon/pull/9308
- [ ] https://github.com/neondatabase/neon/pull/9399
- [ ] https://github.com/neondatabase/neon/issues/9384
- [ ] #9421
- [ ] https://github.com/neondatabase/neon/issues/9386
- [ ] offloaded timeline query API
- [ ] audit code to ensure that gc_info children being removed on offload is fine
- [ ] test retain_lsn functionality for offloaded branches
- [ ] test for deletion of offloaded timeline
- [ ] test for many timelines depending on each other
- [ ] test that offloaded timelines are excluded from heatmaps and never downloaded to secondaries
- [ ] pytest for archival/unarchival together with storage controller and old generations
- [ ] controller: ensure that timeline passthrough operations (incl. archival) land on shards with the latest generation (check generation is still current after they ack)
- [ ] --- Milestone: archived branches are cheap locally -- (no index load on startup, no layers on disk, no Timeline at runtime)
- [ ] pageserver: implement warm-up API
- [ ] tests: after warming up, a read workload should not result in any on-demand downloads
- [ ] pageserver: expose billing metrics for active size vs. archived size
- [ ] add timeline flattening (including some way to block offload for it)
- [ ] --- Milestone: archived branches are cheap in remote storage -- eventually written as compressed image layers at a single LSN
- [ ] make scrubber check S3 invariants: a) timeline that is offloaded must be archived, b) timeline that is archived must have all of its children archived as well
- [ ] unified lock for offloaded/timelines/loading timelines: eliminates some race conditions and inconsistent states
- [ ] test: offload but pageserver crashes somewhere in delete_local_timeline_directory: can the pageserver deal with remnants after a restart?

arpad-m commented 2 months ago

This week:

arpad-m commented 2 months ago

This week:

get storage controller PR merged (tests missing): https://github.com/neondatabase/neon/pull/8680
make offload MVP PR (ideally also reviewed plus merged)

arpad-m commented 1 week ago

This week:

get #8907 merged (almost there)
get #9289 merged.
make PR for "still regard ancestor_lsn of non-flattened offloaded timelines as retain_lsn"
make PR for "Synthetic size should exclude archived timelines"
maybe also PR for "persistence for offloaded state"

arpad-m commented 4 days ago

This week:

get #9308 merged (almost there)
get #9289 merged (got a review last week, no time for getting it merge worthy yet)
see if there needs to be more changes to synthetic size calculation beyond #9308.
make PR for persistence for offloaded state

neondatabase / neon