Closed problame closed 6 months ago
The new compaction code in https://github.com/neondatabase/neon/pull/6830/ no longer calls count_deltas
. (It needs testing to see if it introduces other problems of course)
Yeah, aware, @arpad-m is going to work on compaction, but, it'll be many more weeks until it lands, I think.
@VladLazar just in case you didn't see it, my PR to avoid count_deltas()
is here: https://github.com/neondatabase/neon/pull/6868
Feel free to take it over
Update:
Looks like https://github.com/neondatabase/neon/pull/7230 helped here. Generated another flamegraph this morning and it's not exhibiting the original issue:
(ask me if you want the svg - can't add it here for some reason)
Problem
original thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1708513450565049
Now that the flamegraphs are fixed, I took one on ps-2 ap-southeast-1 to investigate the elevanted CPU usage after enabling tokio-epoll-uring there. That investigation isn't the subject of this thread though, but, the general finding of where that PS is spending its time.
LayerMap::count_deltas
inside time_for_new_image_layer completely dominates the CPU usage there. AFAICT that is called for every tenant, even if the layer map hasn't changed.This is wasteful.
Solution
If the layer map and partitioning is the same as in an earlier call, early-exit in
time_for_new_image_layer
to avoid the call tocount_deltas()
.Tasks