Permanent connection deadlock on many concurrent requests

GNMoseke commented 1 year ago

Describe the bug

We are seeing a failure where a large number of concurrent requests (relative to the size of the server) can permanently deadlock the eventloop for any future requests coming in to the server. The connection pool is exhausted for some requests as expected, and added to the waitlist as expected, but will time out for a very simple query. The waitlist then appears to never be cleared/freed back to the pool and any future requests will always time out.

The reproducer below contains a more detailed rundown of the cloud environment we were first able to identify the issue in.

To Reproduce

I have a small reproducer here with a minimal serve/single fluent model that can be slammed with requests: https://github.com/GNMoseke/PostgresNIODeadlockRecreator

Expected behavior

The pool should begin properly handling future requests once requests time out and the pool is capable of handling them again.

Environment

Vapor Framework version: 4.67.4
OS version: ubuntu 20.05 See reproducer for full cloud environment details

fabianfett commented 1 year ago

This sounds very much like an issue with the connection pool that should be raised against PostgresKit/AsyncKit. cc @gwynne

gwynne commented 1 year ago

After some investigation, I'm reasonably sure this is a race condition in AsyncKit - transferring there.

gwynne commented 1 year ago

@GNMoseke If you have a chance, would you be able to see if vapor/async-kit#102 solves this issue? I was unable to reproduce it locally despite having a fairly good idea of the cause.

vapor / async-kit