neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.28k stars 408 forks source link

Memory leak in sql_exporter in compute node? #7966

Open Bodobolero opened 3 months ago

Bodobolero commented 3 months ago

Steps to reproduce

I ran multiple tests (pgvector indexing, index from subselect) etc., see https://neondb.slack.com/archives/C0732L0A4AH/p1717575836689089?thread_ts=1717499447.241659&cid=C0732L0A4AH

At the end we had an OOM in the compute node and the error log said:

2024-06-05 11:36:07.198 [12358.306313] Out of memory: Killed process 168 (sql_exporter) total-vm:1268868kB, anon-rss:12712kB, file-rss:8672kB, shmem-rss:0kB, UID:65534 pgtables:200kB oom_score_adj:0

1.2 GB RAM for a component just doing some SQL queries to collect metrics seems expensive. Probably we have a memory leak in the sql exporter or it can not get rid of collected metrics quickly enough.

Expected result

Actual result

Environment

Logs, links

https://neonprod.grafana.net/goto/gT_JXPyIR?orgId=1

bayandin commented 3 months ago

Currently, we use sql-exporter 0.13. https://github.com/neondatabase/neon/blob/83ab14e27119ffdfef6ce0f5cd883b847de8c24a/vm-image-spec.yaml#L420

We can try to update it to 0.14.3, to check if that's help