neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.78k stars 430 forks source link

OOM in production with high tenant count #3620

Closed koivunej closed 10 months ago

koivunej commented 1 year ago

OOMs have been observed in production, related slack threads:

My understanding of the root cause is that while pageserver uses somewhat conservative amount of RAM, for each been-active tenant we have a postgres --wal-redo process, which is about 22MB RSS idle.

Possible solutions iterated in threads:

koivunej commented 1 year ago

Increasing oom_score_adj for each postgres --wal-redo would probably have a negative effect in the case memory usage of one would spike, then it would be a very likely candidate to be killed and we would just retry.

shanyp commented 1 year ago

adding relevant discussion from slack: https://neondb.slack.com/archives/C03H1K0PGKH/p1678107912052379

koivunej commented 1 year ago

Because #3739 was merged, all that remains is dying on timeout on read iff we have completly handled sending page, so don't die between start...walrecords before getpage.

shanyp commented 1 year ago

@koivunej any objections to close this one ? or do we have followups ? (I think this is dup of #3687)

koivunej commented 11 months ago

this is dup of #3687

Well, it cannot duplicate a later issue now can it :)

This became relevant for the choom parts. Looking around, I do still see a chromium bug open about the fact so unsure if this is doable: https://bugs.chromium.org/p/chromium/issues/detail?id=333617 -- esp. given rss differences of pageserver AND walredo processes, but there might be spikes of which we do not know.

koivunej commented 11 months ago

Discussed in 2023-11-06 meeting, not going to be worked in near future.

koivunej commented 10 months ago

Noted #5877 in the issue description. I don't think we need the choom route at least currently, which is the only unimplemented.