Open schwaiger opened 4 years ago
You might customize the connection test (use preferredTestQuery
or a custom ConnectionTester
) to understand why validation seems to succeed when your real logic fails? You might, for example, hit one of your real tables in a test Connection test.
unreturnedConnectionTimeout
doesn't seem so relevant here.
c3p0 is definitely noticing problems — Connection errors are signalled during your sessions. That ought to have provoked c3p0 to exclude from the pool and then replace those bad Connections.
We have a special setup, we use a pool size of 1, as we have 100 JVM instances writing to one single MariaDB with each only a single thread processing the queue of DB updates for all tables.
When the DB is shortly down due to a failover, most instances happily resume actions once db is back but a few (last time 4 out of 104) remain indefinitely in a state where the app gets returned an invalid connection and the validation checks every 2 min (visible in debug log mode) succeed. The jmx interface actually shows then 2 connections for this pool, also the logs contain i.e. "total size: 28; checked out: 1; num connections: 2;".
adding unreturnedConnectionTimeout didn't change the game (looks to me as if it doesn't kick in). a window for recovery seems to exist only when the db is down while validation is scheduled. last time db was taken down twice. at first invalidation of the connection (destroy connection) the problem continued. at second time (exact the same log entries as far as I could see) the stale connection was gone and the app got the good one again.
symptom to the app:
last impact to App logged: 17 Oct 2019 09:51:47,570 ERROR [VenusInformer] (MonitorDAO.java:53) - failed to update monitor in DB: org.hibernate.TransactionException: JDBC begin transaction failed:
DB log (2 times failover: ~40s downtime each)
validation results of the respective pool (1st downtime not noticed, trigger of last destruction unclear, but from then on app could write again)
once an error is detected, the app throws away writes for 30 seconds before it tries the next DB update. A test on borrow should be avoided for performance reasons (some instances do 100 updates per second). Is the described scenario a bug or the configuration settings could be improved?