neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
15.31k stars 446 forks source link

pageserver: compaction holds layers read lock too long #7298

Open koivunej opened 8 months ago

koivunej commented 8 months ago

Impact

Ingestion can get delayed because it can't get the layers lock while it waits for compaction to give it up. Compaction will be slow if it has to do on demand layer downloads.

Original issue text

I found a surprisingly long disk usage-based eviction collection time from the logs on a single-tenant pageserver (ps-3.eu-central-1). Here we see the long collection (logs):

2024-04-02T07:45:12.406776Z  INFO disk_usage_eviction_task:iteration{iteration_no=50566}: collection took longer than threshold tenant_id=df254570a4f603805528b46b0d45a76c shard_id=0000 elapsed_ms=48890

Normally the time to collect is around 35ms on this system.

No direct clues around it, only two flushes but no upload schedulings:

2024-04-02T07:44:23.516218Z  WARN disk_usage_eviction_task:iteration{iteration_no=50566}: running disk usage based eviction due to pressure ...
2024-04-02T07:44:37.646456Z  INFO wal_connection_manager{tenant_id=X shard_id=0000 timeline_id=Y}:connection{node_id=7}: Will roll layer at 452/40CC4C98 with layer size 268428025 due to layer size (268436275)
2024-04-02T07:44:38.039676Z  INFO request{method=GET path=/metrics request_id=e744b27e-d8d3-4e92-8290-8c9e074b3cd7}:blocking: responded /metrics bytes=314212 total_ms=31 spawning_ms=0 collection_ms=29 encoding_ms=1
2024-04-02T07:45:05.561749Z  INFO wal_connection_manager{tenant_id=X shard_id=0000 timeline_id=Y}:connection{node_id=7}: Will roll layer at 452/50B593F8 with layer size 268434820 due to layer size (268443070)
2024-04-02T07:45:12.406776Z  INFO disk_usage_eviction_task:iteration{iteration_no=50566}: collection took longer than threshold tenant_id=X shard_id=0000 elapsed_ms=48890

Something was stuck as there are also 15 layer rollings before the first one is scheduled to be uploaded for timeline 07..:

~RemoteTimelineClient was probably very busy at uploading 7k new image layers.~ No, it was no longer busy at 2024-04-02T07:44:00Z as the uploads had been completed.

koivunej commented 8 months ago

pageserver_timeline_ephemeral_bytes shows this rise in frozen layers.

Better logs link.

jcsp commented 7 months ago

Timeline::layers is held for read by compaction while it does on-demand downloads.

Latest metrics indicate eviction layer collection is ~100ms consistently.

koivunej commented 6 months ago

Goal for the week:

koivunej commented 6 months ago

Didn't get to this last week, so this week it's the same goal.

koivunej commented 6 months ago

Didn't get to this last week, so this week it's the same goal.