Closed GNMoseke closed 1 year ago
This sounds very much like an issue with the connection pool that should be raised against PostgresKit
/AsyncKit
. cc @gwynne
After some investigation, I'm reasonably sure this is a race condition in AsyncKit - transferring there.
@GNMoseke If you have a chance, would you be able to see if vapor/async-kit#102 solves this issue? I was unable to reproduce it locally despite having a fairly good idea of the cause.
Describe the bug
We are seeing a failure where a large number of concurrent requests (relative to the size of the server) can permanently deadlock the eventloop for any future requests coming in to the server. The connection pool is exhausted for some requests as expected, and added to the waitlist as expected, but will time out for a very simple query. The waitlist then appears to never be cleared/freed back to the pool and any future requests will always time out.
The reproducer below contains a more detailed rundown of the cloud environment we were first able to identify the issue in.
To Reproduce
I have a small reproducer here with a minimal serve/single fluent model that can be slammed with requests: https://github.com/GNMoseke/PostgresNIODeadlockRecreator
Expected behavior
The pool should begin properly handling future requests once requests time out and the pool is capable of handling them again.
Environment