Open Bodobolero opened 1 day ago
Previous thread re this problem https://neondb.slack.com/archives/C04DGM6SMTM/p1731526874214679
Ultimately, on each scrape sql_exporter
does all the SQL specified in the metrics config. So if compute is loaded, then SQL becomes slower and we see these gaps.
So what are the options we have?
Moved to backlog because we don't have any good ideas how to fox it except exploring another tool like Telegraf
@tristan957 suggests that we can bump the sql_exporter
version
Thread about timeout issues, looks like we currently scrape every 10s, so we cannot bump the timeout significantly
Another piece of info from Tristan, sql_exporter seems to have its own metrics
Only metrics defined by collectors are exported on the /metrics endpoint. SQL Exporter process metrics are exported at /sql_exporter_metrics.
Steps to reproduce
run ingest benchmark doc
Expected result
We see metrics collected by sql_exporter for the complete run
Actual result
we are losing metrics - most likely because sql_exporter is exceeding its scrape_timout
we observe this especially when there is large amount of backpressure from PS to compute
Environment
staging
Logs, links
https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from[…]ge_tenant_endpoint_id=ep-misty-river-w2vdg495&viewPanel=19
first reported here
another observation of this - probably related
https://neondb.slack.com/archives/C04DGM6SMTM/p1731526874214679