Tracing for getpage@lsn when it takes a long time

neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

Apache License 2.0

14.28k stars 408 forks source link

If a getpage@lsn request goes long as in takes more than X seconds, we should log a warning with a description of why it went for so long (where time was spent). Breakdown could be high level as total time spent for:

locking layermap
- locking written to inmemory layer
layer map search
layer download + access
amount of loops required

Individual durations should be also exposed via global histograms of getpage execution, if they don't exist already.

Purpose of this logging would be to allow us to understand right away why something was slow.

Original slack thread: https://neondb.slack.com/archives/C05NXJFNRPA/p1696261625702299?thread_ts=1696250393.840899&cid=C05NXJFNRPA

neondatabase / neon

Tracing for getpage@lsn when it takes a long time #5448