swaldman / c3p0

a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.
http://www.mchange.com/projects/c3p0
Other
1.29k stars 337 forks source link

c3p0 does not recover after MySQLNonTransientConnectionException #4

Closed mzapletal closed 11 years ago

mzapletal commented 11 years ago

We are running ActiveMQ (a JMS) broker with MySQL as backend. For connection pooling we use c3p0 with the following configuration

https://gist.github.com/marco4712/cc68422c50ca1473efd0

When the MySQL DB reboots or in case of a failover, the connection to the database is lost and the broker stops. However, having acquireRetryAttempts and acquireRetryDelay specified with values greater than zero, I would expect c3p0 to pursue corresponding reconnect attempts, but a MySQLNonTransientConnectionException ("No operations allowed after connection closed") is thrown and no reconnect attempts happen.

Please see the full stacktrace below

https://gist.github.com/marco4712/4026ddd138e970724bd1

Is this a bug in c3p0 or subject to wrong configuration?

swaldman commented 11 years ago

hi,

you aren't testing Connections, so c3p0 has no way to know that your database has gone down. (you do have idle Connection tests configured, but every 3600 secs, but once-per-hour tests aren't going to give you prompt notification of db recycles.)

given that you are using an efficient preferredTestQuery, I'd start by setting testConnectionOnCheckout to true. that will incur a small performance penalty inside clients' codepath, but with SELECT 1, it may well be negligibly small.

after you get things working with this simplest approach to testing, if you decide you want to eliminate the small performance penalty, you can switch to testConnectionOnCheckout=false, testConnectionOnCheckin=true, idleConnectionTestPeriod=30 [where the exact value here is a tradeoff you get to make between resource usage and prompt detection of a problem]. but i'd try the simple testConnectionOnCheckout=true first. it's the simplest and most reliable approach.

mzapletal commented 11 years ago

Hi,

thanks for the prompt reply. My bad, I tested it with the testConnectionOn(Checkout|Checkin) as well and also with a short idleConnectionTestPeriod, but it wasn't working as well.

I set now testConnectionOnCheckIn to true again and got a slightly different error (I set the c3p0 logging to debug)

https://gist.github.com/marco4712/0e5ce2746c532b8a975a

swaldman commented 11 years ago

Hi,

So much for promptness! I'm really sorry about that.

In the log you sent, what I see is a lot of Connection tests failing (and a bit of c3p0 close()ing the dead Connections). All this sort of stuff SHOULD happen when the database dies; c3p0 will go through a period of mourning, trying the dead Connections and observing that they are dead. with DEBUG logging, you'll see all that go on.

The question is, 1) when testConnectionOnCheckout is true, clients should never see a bad Connection, all this stuff should be internal to the pool; and 2) when the database recovers, so too should the pool. Are those things true? If so, things are fine, you can ignore the stuff logged at DEBUG. If not, things are less fine.

Sorry again for the delay.

mzapletal commented 11 years ago

Hi,

no problem at all. I know this is an open source project ;)

The question is, 1) when testConnectionOnCheckout is true, clients should never see a bad Connection, all this stuff should be internal to the pool; and 2) when the database recovers, so too should the pool. Are those things true?

No, the pool does not recover. I have tested two scenarios i) rebooting the same instance ii) failover to a another (redundant) instance. In both cases, the database comes back within the maximum timeframe (acquireRetryAttempts x acquireRetryDelay), but the pool does not recover and the application (i.e., the ActiveMQ broker stops).

swaldman commented 11 years ago

hi,

(acquireRetryAttempts x acquireRetryDelay) isn't the maximum time period for recovery; that's indefinite unless you set breakAfterAcquireFailure to true. i don't remember how you had acquireRetryAttempts and acquireRetryDelay configured, and it looks like your original gist with the configuration is gone. can you repost? (or try with these variables left at their default values?)

thanks!

mzapletal commented 11 years ago

Hi,

please find my configuration in the gist below.

https://gist.github.com/marco4712/5859ed401b8fae146cab

Thanks for the clarification regarding acquireRetryAttempts and breakAfterAcquireFailure - I haven't read it properly in the docs.

swaldman commented 11 years ago

hi,

can you try leaving acquireRetryAttempts, acquireRetryDelay, and breakAfterAcquireFailure at their default values? the database should come back (not automatically, but in response to the first client request once the DB is available again). the values you have set are long, but they shouldn't really have broken recovery after DB restart. i'm not sure what's going on. it might be helpful to see the logs at INFO level (without all the failures) to try to understand why it doesn't recover. alternatively, while the app is hung after DB restart, you might try a Thread dump (or, better yet, capture the Thread pool status via JMX).

do be sure there has been a client request after the DB recycle. c3p0 won't try to reconnect without provocation by an external client.

smiles, steve

mzapletal commented 11 years ago

hi steve,

sorry for my very late reply. based on your response

the database should come back (not automatically, but in response to the first client request once the DB is available again). the values you have set are long, but they shouldn't really have broken recovery after DB restart.

I learned that this is not a c3p0 problem at all - in fact I was not familiar enough with the internals of c3p0. This means, I thought that the database "comes back automatically", which is not the case, but a new client request is needed (which totally makes sense). However, the ActiveMQ broker (guaranteeing transactional persistence of the message queues) immediately stops after noticing a problem with the DB connection. Thus, no fresh client requests are made after a DB outage and eventual DB recycle.

Sorry for making a fuss here - as I said this is not a c3p0 issue - and many thanks for your support

marco