timescale / promscale

[DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.
https://www.timescale.com/promscale
Apache License 2.0
1.33k stars 169 forks source link

Alert when PostgreSQL shared_buffers is smaller than open chunks #1792

Closed niksajakovljevic closed 1 year ago

niksajakovljevic commented 1 year ago

If currently open chunks can't fit into shared buffers the i/o pressure and latency increase a lot which leads to overall performance degradation. So when sum of relations and index sizes becomes greater than PostgreSQL shared_buffers we trigger a warning alert. This should help users understand if they need to increase amount of memory allocated to shared buffers or increase total memory allocated for the database.

niksajakovljevic commented 1 year ago

@cevian I rebased on slow query support. Please have a look.

paulfantom commented 1 year ago

Is it possible that due to small number of open chunks we will also have low database connection latency which would trgger PromscaleStorageHighLatency alert?

paulfantom commented 1 year ago

I am slowly starting to think that maybe we should create a higher-lever alert like PromscaleIngestSlow and group things like PromscaleStorageHighLatency or alert from this PR there. This should give us a simpler, more symptomatic alert for which we can just improve Diagnosis and Mitigation sections of a runbook instead of chasing to create more and more cause-based alerts.