Open koivunej opened 8 months ago
pageserver_timeline_ephemeral_bytes
shows this rise in frozen layers.
Better logs link.
Timeline::layers
is held for read by compaction while it does on-demand downloads.
Latest metrics indicate eviction layer collection is ~100ms consistently.
Goal for the week:
Didn't get to this last week, so this week it's the same goal.
Didn't get to this last week, so this week it's the same goal.
Impact
Ingestion can get delayed because it can't get the layers lock while it waits for compaction to give it up. Compaction will be slow if it has to do on demand layer downloads.
Original issue text
I found a surprisingly long disk usage-based eviction collection time from the logs on a single-tenant pageserver (ps-3.eu-central-1). Here we see the long collection (logs):
Normally the time to collect is around 35ms on this system.
No direct clues around it, only two flushes but no upload schedulings:
Something was stuck as there are also 15 layer rollings before the first one is scheduled to be uploaded for timeline
07..
:~RemoteTimelineClient was probably very busy at uploading 7k new image layers.~ No, it was no longer busy at
2024-04-02T07:44:00Z
as the uploads had been completed.