Exchanges deletion cronjob exhausts DB connections

Describe the bug On Monday June 10th all calls to the Kingdom’s DB halted triggering error: DEADLINE_EXCEEDED ClientCall was cancelled at or after deadline. [closed=[CANCELLED], committed=[remote_addr=/10.X.X.X:8443]] This impacted all calls to the Kingdom, including reporting server, panel exchange, herald and direct requisitions.

The deployment didn’t show any memory or CPU constraints

However when analysing the Kingdom DB (Spanner) it is evident that “something” occurred around 7:40 AM UK time when processing stopped and latency went up

At the time (7:40AM) the exchanges-deletion-cronjob triggered on its usual schedule all other cron jobs failed due to not being able to connect to the DB

The issue persisted until the Data Server Deployment Pods were deleted and recreated, afterwards all calls worked fine

Note: the kingdom was only receiving PX traffic at this time, and only happened this time, so the issue seems to be sporadic

Steps to reproduce

Trigger px clean up cronjob

Component(s) affected Kingdom

Version 0.4.4

Environment Origin PRD

Additional context Spanner config is set to 500 PUs

Can see various PX related queries scanning 11K rows and returning 0 rows

world-federation-of-advertisers / cross-media-measurement

Exchanges deletion cronjob exhausts DB connections #1659