Closed Bodobolero closed 5 months ago
We need to investigate the effect on customers. For now it is a P1
Anna, as discussed, please find out if this is still a problem.
Please re-open should this still be an issue for pgbench. Closing tentatively.
Observation:
Important: we also see 276 cases of
could not receive data from client: Connection reset by peer
in a few hours in productionStaging/dev: about 5 % of tenants running pgbench fail with "fatal: Run was aborted; the above results are incomplete."
Analysis of proxy and compute logs shows that the root cause looks like issues in proxy<->compute connection, because:
could not receive data from client: Connection reset by peer
Steps to reproduce
With 500 Postgres compute instances active associated with a single page server (default suspension timeout):
Note that this happens in staging during test of
./cloudbench productionlike_bench init --config productionlike_warmup.yaml --apikey <secret>
, see https://github.com/neondatabase/cloud/blob/1da98ec0e262fb49c7b85127b1e447c45bd64499/bench/internal/controllers/productionlikecontroller/bench/productionlikebenchinit.goExpected result
Each pgbench runs to completion as the other 95 % do.
Actual result
Approximately 5 % of pgbench runs fail with connection reset by peer.
Environment
Staging and Prod
Logs, links
client side (cloud bench):
https://neonprod.grafana.net/explore?schemaVersion=1&panes=%7B%22dab%22:%7B%22datasourc[…]968000000%22,%22to%22:%221706140799000%22%7D%7D%7D&orgId=1
https://neonprod.grafana.net/explore?schemaVersion=1&panes=%7B%22pnf%22:%7B%22datasourc[…]22from%22:%22now-6h%22,%22to%22:%22now%22%7D%7D%7D&orgId=1 https://neonprod.grafana.net/explore?schemaVersion=1&panes=%7B%22pnf%22:%7B%22datasourc[…]22from%22:%22now-6h%22,%22to%22:%22now%22%7D%7D%7D&orgId=1
2024-01-24T14:04:11.213550Z ERROR per-client task finished with an error: Connection reset by peer (os error 104) session_id=a52efa22-75a6-48d0-bf29-cf6baee67a83
Internal discussion
https://neondb.slack.com/archives/C039YKBRZB4/p1706104412540999?thread_ts=1704534532.501369&cid=C039YKBRZB4
https://neondb.slack.com/archives/C060N3SEF9D/p1706107025142509