Open koivunej opened 11 months ago
if one does not do a generic solution (think of some java hierarchical stopwatch) I think this could be implemented rather efficiently and still be mergeable for example for all of the get page requests a basebackup needs to do while for example outliers.
for example, there is most likely a good number of average number of layers accessed => those are on "stack" (part of future on the heap) and if we get to really detrimental cases (>10s) then we will don't need to mind spilling to new heap allocation -- regardless it is important that we'd catch enough information about such cases to fix them.
If a getpage@lsn request goes long as in takes more than X seconds, we should log a warning with a description of why it went for so long (where time was spent). Breakdown could be high level as total time spent for:
Individual durations should be also exposed via global histograms of getpage execution, if they don't exist already.
Purpose of this logging would be to allow us to understand right away why something was slow.
Original slack thread: https://neondb.slack.com/archives/C05NXJFNRPA/p1696261625702299?thread_ts=1696250393.840899&cid=C05NXJFNRPA