r2dbc / r2dbc-pool

Connection Pooling for Reactive Relational Database Connectivity
https://r2dbc.io
Apache License 2.0
331 stars 55 forks source link

Application stops releasing/reusing connections when the pool size is exhausted and ends up in a deadlock #213

Open bmagrys opened 1 month ago

bmagrys commented 1 month ago

Bug Report

Versions

Current Behaviour

I have a spring boot service which uses r2dbc with pooling. I have multiple endpoints which are calling transactional code. At some point some operations were required to not be part of current transaction, but rather be a separate operation to avoid being being connected to current transaction and to be not part of rollback of previous transaction. If the load is higher and starts using full throughput capabilities on netty we see that it's getting stuck forever randomly at some database operation. Based on the enabled DEBUG logs for r2dbc we see that getting or suspending the connection is stuck quite randomly. Without any timeout app stops accepting new requests. With a timeout it's only less worse because it can throw an exception (although it shouldn't IMO, but rather take more time than usual). The last log is usually Suspending current transaction, creating new transaction with name [...], but not only.

Steps to reproduce

The easiest scenario is as follows. I am making more requests (or just transactional service invocations) simultaneously than size of the reactor/netty pool. In my case (due to number of cores) it was 10 by default, which is also the default size of r2dbc pool. I made 20 requests/invocations of service as an example, but even 11 should be enough based on my experience. If the transactional service invokes another transactional service with propagation REQUIRES_NEW then it's stuck under suspending the transaction. 05-07-2024 10:32:04.717 [DefaultDispatcher-worker-7 @coroutine#30] DEBUG o.s.r.c.R2dbcTransactionManager.handleExistingTransaction:207 - Suspending current transaction, creating new transaction with name [com.bmagrys.r2dbc.locked.Demo2Service.test]

https://github.com/bmagrys/r2dbc-pool-issue-deadlock Executing DemoApplicationTests takes forever.

Expected behaviour

Even if the pool is under high load and fully used it shouldn't hang the app forever. Even having timeouts shouldn't make transaction failed but rather just slower. Exhausting the pool that is not that small and equal to shared reactor pool size shouldn't make app unresponsive or break the transaction.

I tried exact same scenario, but on non-reactive stack with use of HikariCp and it's not a problem there. The same code executes just fine.