[Another] Unable to acquire JDBC connection

DorianMaliszewski commented 4 years ago

Describe the bug Hi everyone, I followed the guide on quarkus jdbc, quarkus hibernate orm and quarkus hibernate panache. My environnement :

Adoptopenjdk 8
Quarkus 1.4.2 Final
Dockerfile.jvm for quarkus
Docker mariadb and postgres for development

What's the problem ?

I launched the application with one datasource (same problem on postgres and mariadb).
I put some configuration for timeout (extremely low because of my bug) : See below my conf
After 1000/2000 requests, I get the error : Unable to acquire JDBC connection (Detail below on Scrrenshot section)
The idle_in_transaction_session_timeout is set to 0 on Postgres
The statement_timeout is set to 0 on Postgres too.
This behaviour is the same on Mariadb.
Sometimes it happens at the 50th request sometimes at the 200th, sometimes more but it happens everytime.

Expected behavior

No more Unable to acquire JDBC connection :sob: .

Actual behavior (Describe the actual behavior clearly and concisely.)

To Reproduce Steps to reproduce the behavior:

Make a JDBC connection on Mariadb or Postgres
Create a jax-rs get path
Create a service class annoted @RequestScoped
Make a findAll on your panache entity in the service class
Return the list in your jax-rs get path
Launch the app
Make many request in a short time (Bot or something else)
In a 1000/2000 requests you can get this error

Configuration

# Add your application.properties here, if applicable.
quarkus.datasource.url = jdbc:postgresql://localhost:3306/eztalk
quarkus.datasource.jdbc.url=jdbc:postgresql://localhost:3306/eztalk
quarkus.datasource.driver=org.postgresql.Driver
quarkus.datasource.jdbc.driver=org.postgresql.Driver
quarkus.datasource.username=USERNAME
quarkus.datasource.password=PASSWORD
quarkus.datasource.jdbc.min-size=4
quarkus.datasource.jdbc.max-size=50
quarkus.datasource.jdbc.max-lifetime=150
quarkus.datasource.jdbc.idle-removal-interval=50
quarkus.datasource.jdbc.background-validation-interval=10
quarkus.datasource.jdbc.leak-detection-interval=30
quarkus.datasource.jdbc.acquisition-timeout=10
quarkus.datasource.db-kind=postgresql

Screenshots I can give you the error

eztalk-service_1        | Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
eztalk-service_1        |   at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:47)
eztalk-service_1        |   at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
eztalk-service_1        |   at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
eztalk-service_1        |   at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:107)
eztalk-service_1        |   at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:134)
eztalk-service_1        |   at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.connection(StatementPreparerImpl.java:50)
eztalk-service_1        |   at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:149)
eztalk-service_1        |   at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:176)
eztalk-service_1        |   at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:151)
eztalk-service_1        |   at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:2104)
eztalk-service_1        |   at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2034)
eztalk-service_1        |   at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2012)
eztalk-service_1        |   at org.hibernate.loader.Loader.doQuery(Loader.java:953)
eztalk-service_1        |   at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:354)
eztalk-service_1        |   at org.hibernate.loader.Loader.doList(Loader.java:2838)
eztalk-service_1        |   at org.hibernate.loader.Loader.doList(Loader.java:2820)
eztalk-service_1        |   at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2652)
eztalk-service_1        |   at org.hibernate.loader.Loader.list(Loader.java:2647)
eztalk-service_1        |   at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:506)
eztalk-service_1        |   at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:396)
eztalk-service_1        |   at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:219)
eztalk-service_1        |   at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1404)
eztalk-service_1        |   at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1565)
eztalk-service_1        |   at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1533)
eztalk-service_1        |   ... 285 more
eztalk-service_1        | Caused by: java.sql.SQLException: Sorry, acquisition timeout!
eztalk-service_1        |   at io.agroal.pool.ConnectionPool.handlerFromSharedCache(ConnectionPool.java:244)
eztalk-service_1        |   at io.agroal.pool.ConnectionPool.getConnection(ConnectionPool.java:173)
eztalk-service_1        |   at io.agroal.pool.DataSource.getConnection(DataSource.java:66)
eztalk-service_1        |   at io.quarkus.hibernate.orm.runtime.customized.QuarkusConnectionProvider.getConnection(QuarkusConnectionProvider.java:23)
eztalk-service_1        |   at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
eztalk-service_1        |   at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:104)
eztalk-service_1        |   ... 305 more

Environment (please complete the following information):

Output of uname -a or ver: Linux NOPLACELIKEHOME 4.19.107-1-MANJARO #1 SMP Fri Feb 28 21:14:27 UTC 2020 x86_64 GNU/Linux

Output of java -version:

openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

GraalVM version (if different from Java): Not using
Quarkus version or git rev: 1.4.2 Final

Build tool (ie. output of mvnw --version or gradlew --version):

Apache Maven 3.6.3 (NON-CANONICAL_2019-11-27T20:26:29Z_root)
Maven home: /opt/maven
Java version: 1.8.0_242, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-8-openjdk/jre
Default locale: fr_FR, platform encoding: UTF-8
OS name: "linux", version: "4.19.107-1-manjaro", arch: "amd64", family: "unix"

Additional context

I tried to put acquisition timeout at 30s or more but it's the same and I don't think it can be possible to an user waiting more than 30s because it can be difficult to get a JDBC connection. I tried to follow advices I saw on Google but none did the job.

I'm available by this issue, email, phone, or others to talk more about this.

Thank you in advance for the time you will take to read this,

cpmoore commented 4 years ago

Enable datasource metrics using

 <dependency>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-smallrye-metrics</artifactId>
        </dependency>

and

quarkus.datasource.metrics.enabled=true
quarkus.datasource.jdbc.enable-metrics=true

Then check the vendor_agroal_active_count metric on the /metrics endpoint. Is it constantly increasing?
I had this issue where it eventually reached the max value and wouldn't create any more connections. Turned out to be an issue where my code wasn't closing the connection if the sql statement threw an exception.

DorianMaliszewski commented 4 years ago

Ok so my vendor_agroal_active_count is increasing over the requests, so what did you do ? The only error I took is Jackson mapping statement error ? Is this error which doesn't close the statement ? Sometimes I get jackson error : Could not prepare statement for id .....

Like this one com.fasterxml.jackson.databind.JsonMappingException: could not prepare statementError And in my console :

eztalk-service_1        | 2020-05-07 01:05:27,304 WARN  [org.hib.eng.jdb.spi.SqlExceptionHelper] (executor-thread-1) SQL Error: 0, SQLState: null
eztalk-service_1        | 2020-05-07 01:05:27,304 ERROR [org.hib.eng.jdb.spi.SqlExceptionHelper] (executor-thread-1) Connection is closed
eztalk-service_1        | 2020-05-07 01:05:27,305 ERROR [io.und.req.io] (executor-thread-1) Exception handling request 575959f8-456c-497e-ae80-cf7c913b9a92-6 to /eztalk/patient-requests/: org.jboss.resteasy.spi.UnhandledException: com.fasterxml.jackson.databind.JsonMappingException: could not prepare statement
....
eztalk-service_1        | Caused by: com.fasterxml.jackson.databind.JsonMappingException: could not prepare statement
...
eztalk-service_1        | Caused by: org.hibernate.exception.GenericJDBCException: could not prepare statement
...
eztalk-service_1        | Caused by: java.sql.SQLException: Connection is closed
...
eztalk-service_1        | 2020-05-07 01:05:27,307 WARN  [org.hib.eng.jdb.spi.SqlExceptionHelper] (ForkJoinPool.commonPool-worker-2) SQL Error: 0, SQLState: null
eztalk-service_1        | 2020-05-07 01:05:27,308 ERROR [org.hib.eng.jdb.spi.SqlExceptionHelper] (ForkJoinPool.commonPool-worker-2) CallableStatement is closed
eztalk-service_1        | 2020-05-07 01:05:27,307 WARN  [io.agr.pool] (ForkJoinPool.commonPool-worker-0) Datasource '<default>': JDBC resources leaked: 1 ResultSet(s) and 1 Statement(s)

Do you think is this jackson error which take my app out ?

DorianMaliszewski commented 4 years ago

The vendor_agroal_active_count only increase when i send multiple requests, not only when i get the error, but it doesn't increase at every request...

DorianMaliszewski commented 4 years ago

I tested :

1 request error
1 request no error

active count + 2

this is my method :

JAX-RS Get on /blablabla/ : BlablaResource.java
BlablaResource.java call the Request Scoped BlablaService.java method (findAll)
findAll in BlablaService.java only do Blabla.findAll().list();
BlablaResource.java get the list and do
Response.ok(list.parallelStream().map(BlablaDTO:new).collect(Collectors.toList())).build();
What is wrong ? I didn't put any @Transactional on readOnly methods
In my DTO constructor I access the relation of the persisted entity (readOnly) to put in my DTO

DorianMaliszewski commented 4 years ago

Problem found

Ok, so after multiple tests, I find where is the problem :

Hibernate Panache

I don't know why but when I call Blabla.list("service = ?1", id) it works but doesn't close the statement, and in my metrics the agroal_active_* is increasing.

To resolve that

I just injected the entityManager in my service class and make the request with DTO projection in my request

Detail of my process

May be the statement is not close because :

I get the request in my BlablaResource.java (findAll) with @Path and @Get annotations
I use my BlablaService.java which I injected with @Inject
This service class is @RequestScoped
SO In my findAll method I call the findAllFromServiceClass method
In findAllFromServiceClass method I just do a return Blabla.list("service = ?1", id)
I get the list in findAll method and do .parallelStream().map(BlablaDTO::new).collect(Collectors.toList())
In my DTO I try to access relation (@OneToMany, @OneToOne, ...)
I think the problem come from this : when I try to access the relation in my DTO constructor the requests are made but the statement are never closed.

So, if someone can help me on that, or confirm that there are a problem with the extension, it would be nice.

I'm available to talk more about this, and may be help to solve this bug.

EDIT

I confirm the problem is when I do the transformation of Blabla to BlablaDTO, it's the same when I put it in BlablaResource or BlablaService. To solve this I need to put the DTO constructor in the query directly

nandorholozsnyak commented 3 years ago

I have just bumped into this problem, and I understand the problem, but should not it be fixed somehow?

I was running stress tests on a Quarkus based application with JMeter, the thread number was 300 and the loop count was 100 and it started to throw this exception around the half of the test, so it was really annoying that it was creating exceptions, the error rate was really small, 0.01-0.02% percentage. The application is a simple CRUD based app, with some e-commerce based, list items, add items to cart, and start order and I also used Panache. I'm on 1.9.2.Final. The http thread pool size was 300 and the datasource pool size was 50.

gsmet commented 3 years ago

@nandorholozsnyak if you can get us a reproducer, that would help.

nandorholozsnyak commented 3 years ago

Hey @gsmet , I will create one as soon as I can.

gsmet commented 3 years ago

/cc @Sanne FYI, there might be some odd issue lurking here.

DorianMaliszewski commented 3 years ago

@gsmet @Sanne Hi, There are sometimes since I create this issue. I really think that the problem due to the different scope. Another possibility is that I found that sometimes Jackson throw me an error.. may be this doesnt close the transaction. On the other hand I found that the @Path which is normally RequestScoped by default doesn't update request variable (Like Claim for oidc) So may be there are an error in the lifecycle of the transaction.

I gave an exemple on the 8 May response. We can talk more about this error.

Sincerely,

Sanne commented 3 years ago

yes this looks tricky, I should investigate. I need to finish some releases, but hope to switch to this on Wednesday.

Sanne commented 3 years ago

Unless @barreiro maybe has some time to look into this? Please let me know :)

tgallin commented 3 years ago

Hi, I've witnessed the "acquisition timeout" issue too with an as400 database, using the jt400 driver. A problem that someone else also mentionned in this article https://stackoverflow.com/questions/62307245/quarkus-as400-datasource-doesnt-use-all-available-connections

I can provide details if needed. I'd be happy to help. I've been using Quarkus a lot since version 1.4.2 and really enjoy it. Thanks for the great work !

rlfaber commented 3 years ago

I just recently had this issue and figured out the problems and cleaned up all cases of connections that were opened during transactions... it is not just forgetting to close connections or prepared statements or regular statements or results/row-sets... it is quite difficult if the code is complex... but end of the day... the "acquisition timeout" issue resolved when all the code was fixed. It would be nice if the Agroal connection pool had a mode that was picky... and one that was less picky... since I had doubts for a bit that I would find all the cases... then I would just want the transactional code to work as it does without connection pooling.... or without a picky connection pooler. It would also be nice to find all the issues upfront... like failing to close connections... but the ones that it failed to find were the cases of re-opening (or creating a new connection) in the middle of a transaction... a copied connection object... something like that... caused this "acquisition timeout" after 5 or 10 or so hits... not right away. I'm only posting to assure any readers... that the issue really is in the code using this connection pool, and once the bugs are found, the pooler works great.

barreiro commented 3 years ago

@typelean thank you so much for the feedback.

Some "picky' modes we added in Agroal 1.10 to deal with the issues you describe. These are not yet exposed as a configuration on Quarkus at the moment, but are expected to be in a future version.

lentgrew commented 2 years ago

@tgallin Hi, were you able to solve this? Can you provide more details? Thanks.

tgallin commented 2 years ago

@tgallin Hi, were you able to solve this? Can you provide more details? Thanks.

Hi, unfortunately, I never solved this issue. As @typelean suggested it was a problem in the code of connections not properly closed I had a second look at our code but couldn't find places where I forgot to close the connections. I ended up disabling the pooling. I know this workaround is not great but it is the only one that has worked for me so far. I haven't tried the latest quarkus version though.

quarkus.hibernate-orm.dialect=org.hibernate.dialect.DB2400Dialect quarkus.datasource.jdbc.driver=com.ibm.as400.access.AS400JDBCDriver quarkus.datasource.db-kind=other quarkus.datasource.jdbc.url=jdbc:as400:xxxxx quarkus.datasource.username=xxx quarkus.datasource.password=xxx quarkus.datasource.jdbc.pooling-enabled=false

rlfaber commented 2 years ago

I did spend a lot of time fiddling with my custom Jdbc library (jdbc 4.0 with postgres) and a lot of testing all kinds of database operations... to get it working well. So I am really hoping something could be changed like mentioned (above) ("picky" modes vs easy-simple modes)... because for a while, I felt the pain of having to support production with the "picky" Agroal, yet still needing connection pooling.

Now, it has been working reliably in production for months. Currently on Quarkus 1.11.2.Final and 1.11.7.Final, with corretto jdk 11.
quarkus.datasource.jdbc.initial-size=5 quarkus.datasource.jdbc.max-size=10 quarkus.datasource.jdbc.min-size=4 quarkus.datasource.db-kind=other quarkus.datasource.jdbc.driver=org.postgresql.Driver

Sanne commented 2 years ago

@typelean could you share some more details please?

To be fair I really do like Agroal being very precise in its demands, as otherwise people seem to have many ways to shoot themselves in the foot without even realizing how critical it is to get this right.

But of course details matter, especially if you know what you're doing we could use your feedback to make it better and/or more flexible.

rlfaber commented 2 years ago

To be honest, I am not confident of being able to help much... I am sorry... Yes, I have my own jdbc library because I don't like the bloat of JPA and I don't even like the complexity of weird things like Ibatis, when all it really takes is a thin layer over Jdbc (I am basing the queries on RowSet JDCB 4.0) and my close statements are like (below). And these close statements are called at various points in my use of preparedStatement queries, updates, and Statement queries and updates, with or without connection (for transactions or not). But it is company code and not highly reviewed, so I can't share it all of course...

The thing I noticed was moving from DBCP was that it was very tolerant of not properly closing connections.... yet still performed connection pooling and was reliable in production. As I moved to Agroal, it got more picky, but in every case I could see that it was correct, and my code was imperfect... I just used isolation and a lot of testing to fix it. But if Agroal had a not so picky mode that was safe enough (like DBCP was), though not as proper, then I could have had less stress with my production problem... when I did not have enough time to figure out the picky fixes.

So these are my close statements... that are called all over the place in my use of PreparedStatement, Statement, for queries and updates.

public void close() { // rs = ResultSet ps = Prepared Statement st = Statement c = Connection if (rs != null) { try { rs.close(); } catch (Exception e) { error = "The result set cannot be closed."; } rs = null; } if (ps != null) { try { ps.close(); } catch (Exception e) { error = "The prepared statement cannot be closed."; } ps = null; } if (st != null) { try { st.close(); } catch (Exception e) { error = "The statement cannot be closed."; } st = null; } if (c != null) { try { c.close(); } catch (Exception e) { error = "The data source connection cannot be closed."; } c = null; }
} private void cClose() {
if (c != null) { try { c.close(); } catch (Exception e) { error = "The connection cannot be closed."; } c = null; }
} private void psClose() {
if (ps != null) { try { ps.close(); } catch (Exception e) { error = "The prepared statement cannot be closed."; } ps = null; } } private void rsClose() {
if (rs != null) { try { rs.close(); } catch (Exception e) { error = "The result set cannot be closed."; } rs = null; } }

Sanne commented 2 years ago

To be honest, I am not confident of being able to help much... I am sorry...

Don't worry :) Just trying to understand your perspective.

Yes, I have my own jdbc library because I don't like the bloat of JPA and I don't even like the complexity of weird things like Ibatis, when all it really takes is a thin layer over Jdbc (I am basing the queries on RowSet JDCB 4.0) and my close statements are like (below). And these close statements are called at various points in my use of preparedStatement queries, updates, and Statement queries and updates, with or without connection (for transactions or not). But it is company code and not highly reviewed, so I can't share it all of course...

Ok you don't like JPA - fair enough, it's not a fit for all use cases. But this doesn't imply you have to use your own JDBC library - what do you mean by that, not a custom JDBC implementation but rather you're suggesting you have your own thin layer above the JDBC datasource pool, right?

The thing I noticed was moving from DBCP was that it was very tolerant of not properly closing connections.... yet still performed connection pooling and was reliable in production. As I moved to Agroal, it got more picky, but in every case I could see that it was correct, and my code was imperfect... I just used isolation and a lot of testing to fix it. But if Agroal had a not so picky mode that was safe enough (like DBCP was), though not as proper, then I could have had less stress with my production problem... when I did not have enough time to figure out the picky fixes.

Interesting point there; I'll think about it. cc/ @barreiro WDYT about this specifically? I normally would expect people to have solid coverage by integration tests, but indeed while we should fail-fast in staging/dev mode it might be fair to try being more lenient in production mode.

rlfaber commented 2 years ago

it might be fair to try being more lenient in production mode.

At this point I am fine with the way Agroal works and I don't need a "lenient" mode (like DBCP) because I have fixed my "thin layer" over JDBC and all is working fine in Dev and Production. I would not want to run a different mode in production if I could get the "picky" mode working... but while I was in a state where I had not resolved all the "picky" issues... I could not ignore production that needed to keep working... So I would have a appreciated the "lenient" mode to get by... but I was forced to resolve all the "picky" issues. But it was a bit stressful, since there was no way I was going to change all my code... No it was can I use Agroal or DBCP to get connection pooling working... and Agroal was working much better in Quarkus. I'm the only developer at our place of 5 developers who like Quarkus... and so I have to fight a bit of a battle to get the good stuff integrated in production code... Just care about others who may need connection pooling and are not willing to buy into JPA... and don't quite have all their "thin layer" JDBC quite perfect... DBCP shielded my from it with its "lenient" behavior.

barreiro commented 2 years ago

A few features were added to Agroal and exposed on Quarkus to help developers with the correctness of their applications.

Transaction Requirement - this setting ensures that all connections are acquired on the context of a transaction.
Extended leak report - this setting, used along with Leak detection, collects and displays info about connection acquisition and usage that we hope will be useful for debugging connection leaks and pool exhaustion.

One other thing that would be helpful as well, but I never got a chance to work on it, would be the possibility to manage the pool from the Quarkus developer console (list connections in use and flush them manually if necessary).

Going lenient sounds like a good idea, until it's not. I would rather stay away from going down that route.

marcogutto commented 2 years ago

I'm getting the same problem.

I'm running Quarkus on 2.4.0.Final.

I'm using a AWS RDS Postgres Database.

This is my configuration:

quarkus.datasource.db-kind = postgresql quarkus.datasource.username = xxxxx quarkus.datasource.password = xxxxx quarkus.datasource.jdbc.url = jdbc:postgresql://xxx.xxx.xxx.xxx:5432/xxxxxx

I tried this configuration too:

quarkus.datasource.jdbc.max-size = 1000 quarkus.datasource.jdbc.idle-removal-interval = 10

But it didn't worked.

@tgallin this configuration (quarkus.datasource.jdbc.pooling-enabled=false) 'solved' the problem?

@barreiro I didn't understand how this configuration (quarkus.datasource.jdbc.transaction-requirement) can fix the problem or what I need set to fix (off, warn, strict).

Thanks in advance!

tgallin commented 2 years ago

I have the "acquisition timeout" issue with as400 databases only, using the jt400 driver. I never had any problem with postgresql or sqlserver databases.

@marcogutto yes, to work around the issue I have with as400 databases, I use this configuration (quarkus.datasource.jdbc.pooling-enabled=false) and it works. The performance is terrible though but that was to be expected not using a pool of connections. I'd rather use a better alternative but so far I haven't found any.

barreiro commented 2 years ago

@barreiro I didn't understand how this configuration (quarkus.datasource.jdbc.transaction-requirement) can fix the problem or what I need set to fix (off, warn, strict).

Acquiring with a transaction ensures that it will eventually complete and the connection is released.

Getting back to the issue, it's hard to tell what is the cause for pool exhaustion without a reproducer. There are many reports of Agroal working fine with the same or similar settings. Disable pooling is indeed an horrible workaround in terms of performance.

m-stramel commented 2 years ago

I am experiencing the same exception when using smallrye reactive messaging as well. When a process I have created is kicked off to send ~65K messages to the kafka consumer the "Unable to acquire JDBC connection" exception is intermittently thrown. During one run of the process I received 20 exceptions, but the next run of 65K messages produced 0 exceptions.

My implementation is simple in the fact that I have a service with one query retrieving data. The @Transactional annotation on the getLinkId() method should be closing the connection to release it back to the pool. I have also tried to set the LinkService to @ApplicationScoped and using the @Transactional default of REQUIRED.

Java version: 17 GraalVM: 21.3.0 Quarkus: 2.5.1 Maven: 3.8.3

quarkus.datasource.db-kind=postgresql
quarkus.datasource.username=
quarkus.datasource.password=
quarkus.datasource.jdbc.url=jdbc://jdbc url here
quarkus.hibernate-orm.database.generation=none
quarkus.hibernate-orm.dialect=org.hibernate.dialect.PostgreSQL10Dialect
quarkus.datasource.jdbc.extended-leak-report=true
quarkus.datasource.jdbc.transaction-requirement=strict

@RequestScoped
public class LinkService {

    @Inject
    EntityManager em;

    @Transactional(Transactional.TxType.REQUIRES_NEW)
    public String getLinkId(String customerId) {
        String query = "my actual database query"
        LinkedObject result = em.createNativeQuery(query, LinkedObject.class)
                            .setParameter("customerId", customerId)
                            .getSingleResult();
        return result.getid();
    }

}

@ApplicationScoped
public class KafkaConsumer {

    @Inject
    LinkSerivce linkService;

    @Incoming("kafka-in")
    @Outgoing("kafka-out")
    @Blocking(ordered = false)
    public Customer incoming(Customer customer) {
     customer.setExternalLinkId(linkService.getLinkId(customer.getId));

     return customer;
    }
}

kostacasa commented 2 years ago

We are observing this issue during database maintenances. After the database server DNS switch, I am expecting quarkus to detect all of the old "dead" connections and re-establish them automatically after the first acquisition timeout.

Instead, all the quarkus based services remain permanently dead, continuously throwing acquisition timeout errors unless I restart them one by one.

Building some kind of an automated system to restart all services after a DB server upgrade seems like the wrong way to approach it, there should be some resiliency built in into the connection pool.

(running Quarkus 1.12)

rkorver commented 1 year ago

I have a quarkus (2.13.1.Final) application with quartz for scheduled jobs (over multiple pods). I'm using a postgres as a service DB from Azure and I'm regularly, but randomly getting Acquisition timeouts. I've activated db metrics and I seem to have plenty of available connections, nor are all my connections 'active'.

I have JSON/REST endpoinst where we query something from the database (and then return it).

Usually everything is fine, until at random intervals the following happens: These are some of the errors I'm seeing, but they all seem to be referring to the same thing.

org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection

ClusterManager: Error managing cluster: Failed to obtain DB connection from data source 'QUARKUS_QUARTZ_DEFAULT_DATASOURCE': java.sql.SQLException: Acquisition timeout while waiting for new connection: org.quartz.JobPersistenceException

Followed by:

Caused by: java.sql.SQLException: Acquisition timeout while waiting for new connection
...
...
Caused by: java.util.concurrent.TimeoutException

Increasing the pool size doesn't seem to help.
Adding @Transactional to the "Repository" method(s) doesn't either.

Does anyone have some experience with this kind of problem? @barreiro @gsmet what would you investigate first?

kostacasa commented 1 year ago

Not sure if this is helpful/useful but all the services we migrated to the reactive (postgres) driver stopped experiencing these issues. Not discounting the possibility that same issues are occurring but being silently swallowed, but if anything it seems to me like the Vert.x driver has a better "dead connection" detection/restarting approach than the Agroal DS.

P.S. @rkorver it's possible to increase the acquisition timeout property. You can look into that possibly, and decrease the time that idle connections are kept in the pool.

tcollignon commented 1 year ago

Hi,

I have same kind of error with quarkus 2.13.4.Final and Jdk 17

I can give other informations if needed


2022-12-22T16:08:04.540Z: Dec 22, 2022 4:08:04 PM org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions
2022-12-22T16:08:04.540Z: ERROR: Acquisition timeout while waiting for new connection
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:2122)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2059)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2037)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.doQuery(Loader.java:956)
2022-12-22T16:08:04.864Z: at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1459)
2022-12-22T16:08:04.864Z: at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1649)
2022-12-22T16:08:04.864Z: at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1617)
2022-12-22T16:08:04.861Z: Dec 22, 2022 4:08:04 PM io.quarkus.vertx.http.runtime.QuarkusErrorHandler handle
2022-12-22T16:08:04.861Z: ERROR: HTTP Request to /replayers/exercices/search?rangeStartIndex=0&rangeEndIndex=9& failed, error id: 12afb52a-4c91-443f-8113-4bc6f622eb27-1
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:357)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.doList(Loader.java:2868)
2022-12-22T16:08:04.864Z: ... 33 more
2022-12-22T16:08:04.864Z: Caused by: java.util.concurrent.TimeoutException
2022-12-22T16:08:04.864Z: at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
2022-12-22T16:08:04.864Z: at io.agroal.pool.ConnectionPool.handlerFromSharedCache(ConnectionPool.java:296)
2022-12-22T16:08:04.863Z: at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.connection(StatementPreparerImpl.java:50)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:149)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:176)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:151)
2022-12-22T16:08:04.864Z: at io.agroal.pool.DataSource.getConnection(DataSource.java:86)
2022-12-22T16:08:04.864Z: at io.quarkus.hibernate.orm.runtime.customized.QuarkusConnectionProvider.getConnection(QuarkusConnectionProvider.java:23)
2022-12-22T16:08:04.864Z: at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
2022-12-22T16:08:04.864Z: at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
2022-12-22T16:08:04.862Z: at org.tcollignon.user.object.User__JpaIdentityProviderImpl.authenticate(Unknown Source)
2022-12-22T16:08:04.862Z: at io.quarkus.security.jpa.runtime.JpaIdentityProvider$1.get(JpaIdentityProvider.java:42)
2022-12-22T16:08:04.862Z: at io.quarkus.security.jpa.runtime.JpaIdentityProvider$1.get(JpaIdentityProvider.java:35)
2022-12-22T16:08:04.862Z: at io.quarkus.security.runtime.QuarkusIdentityProviderManagerImpl$1$1$1$1.run(QuarkusIdentityProviderManagerImpl.java:58)
2022-12-22T16:08:04.862Z: at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:564)
2022-12-22T16:08:04.863Z: at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
2022-12-22T16:08:04.863Z: at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
2022-12-22T16:08:04.864Z: ... 13 more
2022-12-22T16:08:04.864Z: Caused by: java.sql.SQLException: Acquisition timeout while waiting for new connection
2022-12-22T16:08:04.864Z: at io.agroal.pool.ConnectionPool.handlerFromSharedCache(ConnectionPool.java:320)
2022-12-22T16:08:04.864Z: at io.agroal.pool.ConnectionPool.getConnection(ConnectionPool.java:248)
2022-12-22T16:08:04.865Z: ... 38 more
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.doList(Loader.java:2850)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2682)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.Loader.list(Loader.java:2677)
2022-12-22T16:08:04.863Z: at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:540)
2022-12-22T16:08:04.863Z: at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:400)
2022-12-22T16:08:04.863Z: at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:219)
2022-12-22T16:08:04.862Z: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
2022-12-22T16:08:04.862Z: at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
2022-12-22T16:08:04.862Z: at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1626)
2022-12-22T16:08:04.862Z: at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
2022-12-22T16:08:04.862Z: at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1452)
2022-12-22T16:08:04.862Z: at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
2022-12-22T16:08:04.862Z: at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
2022-12-22T16:08:04.862Z: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2022-12-22T16:08:04.862Z: at java.base/java.lang.Thread.run(Thread.java:833)
2022-12-22T16:08:04.862Z: Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
2022-12-22T16:08:04.862Z: at org.hibernate.query.Query.getResultList(Query.java:165)
2022-12-22T16:08:04.862Z: at io.quarkus.security.jpa.runtime.AbstractJpaIdentityProvider.getSingleUser(AbstractJpaIdentityProvider.java:53)
2022-12-22T16:16:37.108Z: Dec 22, 2022 4:16:36 PM io.quarkus.agroal.runtime.AgroalEventLoggingListener onWarning
2022-12-22T16:16:37.108Z: WARN: Datasource '<default>': Closing connection in incorrect state VALIDATION
2022-12-22T16:16:37.108Z: Dec 22, 2022 4:16:36 PM io.quarkus.agroal.runtime.AgroalEventLoggingListener onWarning
2022-12-22T16:16:37.108Z: WARN: Datasource '<default>': Closing connection in incorrect state VALIDATION```

rkorver commented 1 year ago

Hi all,

The issue in my case was our (Azure) firewall shutting down the connections randomly.

One way to circumvent this is to activate a public endpoint on your azure databases, and connect directly through that (it won't actually be accessible from outside your network, don't worry) , but I'm sure there are other solutions to this as well. For example I've heard that an alternative solution is to switch to "flexible server" instead of single server.

Another issue is that agroal keeps the connections alive too long for the database so I solved that by setting:

quarkus.datasource.jdbc.max-lifetime=10m

Hope this helps cc: @tcollignon @kostacasa

kostacasa commented 1 year ago

Thanks @rkorver ! Another possible (brute force) approach that pretty much resolved this for us was setting the quarkus.datasource.jdbc.max-size to match the container max concurrency setting. But forcing faster cycling of connections could be a more scalable approach that (for most workloads) should have no meaningful performance impact.

Note: increasing datasource connection pool limits is dangerous due to the combination of typically low connection limits that relational databases like postgres have and autoscaling behaviors of K8s-like runtimes. So if you go down that path make sure to do some performance testing to ensure you don't exhaust the DB.

s-seidel commented 1 year ago

Same problem here with MSSQL JDBC connector. The worst thing about it is that even though the application won't run anymore, the smallrye-health check reports that everything is fine. Quarkus 2.16.4.Final

spaykit commented 1 year ago

I experience this issue at Azure too after upgrading from Quarkus 2.4.1.Final to 2.16.4.Final. Connecting to Azure's managed PostgreSQL. Doesn't seem like running out of resources or exceeding max amount of connections.

kraeftbraeu commented 1 year ago

Same problem here at Azure with quarkus 2.16.7 and postgres. Unfortunately neither quarkus.datasource.jdbc.max-size nor migrating to pg flexible server helped.

rkorver commented 1 year ago

@kraeftbraeu maybe I'm mixing things up with the flexible server. For us the issue was the azure firewall shutting down the connections. Activating a public endpoint on the postgres instance and connecting through that allowed us to circumvent the internal firewall between AKS and postgres, or so I understand.

jarve commented 1 year ago

And problem still exist Quarkus 3.2.2.Final and with MariaDB.

hookenful commented 9 months ago

Problem exist Quarkus 2.7.6.Final in Docker Swarm and PostgreSQL

quarkusio / quarkus

[Another] Unable to acquire JDBC connection #9123