Closed Mark-Simulacrum closed 2 years ago
@bors r+
:pushpin: Commit 229283e5fb0fcd99cc98a2a59347d403df9db138 has been approved by Mark-Simulacrum
:hourglass: Testing commit 229283e5fb0fcd99cc98a2a59347d403df9db138 with merge f9baf09a37760a572bafa63fb762181c608f14e7...
:sunny: Test successful - checks-actions Approved by: Mark-Simulacrum Pushing f9baf09a37760a572bafa63fb762181c608f14e7 to master...
This is intended to let us prioritize work on other requests over work on record-progress, thereby avoiding some of the timeouts and "database is locked" errors we would otherwise see when the record-progress requests happen to take priority.
This separate thread is designed to only run when the server has no requests in-flight (other than a short, bounded, queue of record-progress requests). If that queue fills up, we will tell workers to slow down, causing them to retry requests -- currently at fixed intervals and per worker thread, but a future commit might clean that up a little to have a more intentional delay.
In general this should, hopefully, decrease the error rate as particularly human-initiated requests should never have to wait for more than one record-progress event to complete before having largely uncontended access to the database. (Other requests still happen concurrently, but requests are typically very rare in comparison to record-progress which are multiple times a second, effectively constantly processing).
Errors like https://github.com/rust-lang/rust/pull/94775#issuecomment-1064223941 are the primary motivation here, which I hope this is enough to largely clear up.