swaldman / c3p0

a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.
http://www.mchange.com/projects/c3p0
Other
1.29k stars 340 forks source link

C3P0 and ReplicationDriver causing unserved read traffic when master goes down #116

Open bhardwajvikas opened 6 years ago

bhardwajvikas commented 6 years ago

Hi,

We are facing an issue with C3P0 and Mysql ReplicationDriver (master & multiple slaves). As per the public documentation of MySQL https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-master-slave-replication-connection.html, if we use allowMasterDownConnections=true, it should work for read traffic. I tried it without C3P0, it works.

When I tried with C3P0 and my master goes down, application is not able to serve the read traffic itself. WE got the following error (truncated). Error shows C3P0 is trying to fetch the connection (physical connection) and get timeout.

2018-05-22 21:57:22,797 INFO  [1ac417b0-5e0b-11e8-92f9-218e2bda190d] utility.interceptors.DomainToServiceExceptionHandlerInterceptor: Exception happened in invocation
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.GenericJDBCException:
 Cannot open connection
        at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:430)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:420)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:257)
        at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96c(AbstractTransact
ionAspect.aj:63)

Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:118)
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:77)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:690)
        at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)
        at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:71)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
        ... 46 more
Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@575cfec1 -- timeout at awaitAvailable()
        at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1467)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
        at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685)
        ... 49 more
swaldman commented 6 years ago

Hi.

It looks like you have c3p0's checkoutTimeout set, and with your master down, acquiring a Connection sometimes takes longer and c3p0 clients time out.

Try unsetting that config property (or making the timeout longer), and see if that helps.

bhardwajvikas commented 6 years ago

Previously we had checkoutTimeout=500 ms, now I change it to 5000ms (we can not support latency more than 5 seconds) and still it is failing with the same error.

Why C3P0 is not giving me slave db connection for readonly traffic. I am still confused, why master is required for slave connections.

swaldman commented 6 years ago

Hi.

c3p0 doesn't know anything about masters or slaves or anything like that. From a JDBC perspective, those are DBMS/Driver implementation details. c3p0 makes standard requests for Connections, using a JDBC URL and Connection properties you supply, and expects Connections in return. That's it.

To debug your issue, you need to figure out why your application is having a hard time connecting to your DBMS via your Driver, as you have configured the DBMS. The error you are seeing is not very informative. To debug, I suggest unsetting checkoutTimeout entirely, to see what happens. Even now, besides these TimeoutExceptions, there may be more informative errors in your logs. (You should check.)

Note that the TimeoutExceptions that you are seeing are not c3p0 timing out while trying to acquire Connections from your DBMS. It is clients to your application timing out, as c3p0 is not able to serve them Connections promptly.

One issue may be that when your master is down, Connections fail tests, and so they are never provided to clients (or are quickly disposed of, leaving the pool often exhausted). You have not provided any information about your configuration or what version of c3p0 that you are using, but you might try turning all testing off and seeing if things "work". If they do, then you'll need to come up with a Connection test that succeeds even when your master is down. If you are using a recent c3p0 and a recent version of Java, then the Connection.isValid() method is used by default. If you are using an older version of c3p0, or if you have defined preferredTestQuery or automaticTestTable, then something more complicated will be going on. In either case, you need tests that don't invalidate your Connections when your master is down. But first, turn eliminate all testing parameters from your configuration (testConnectionOnCheckin, testConnectionOnCheckout, idleConnectionTestPeriod, preferredTestQuery, automaticTestTable, hibernate's hibernate.c3p0.validate, hibernate.c3p0.idle_test_period) and see if the failures still occur. If they don't, if things "work", gently try to find a new testing regime that doesn't fail when your master is down (probably just testConnectionOnCheckout, with or without a preferredTestQuery depending on what the default test does).

Good luck!

bhardwajvikas commented 6 years ago

Thanks @swaldman, I tried with eliminating all the testing parameters and it worked as expected. (Connection moves to slave). We are using C3P0 = 0.9.5.1. We are using ComboPooledDataSource. Following is the testing configuration

        ComboPooledDataSource dataSource = new ComboPooledDataSource();        
        dataSource.setDriverClass(ReplicationDriver.class.getCanonicalName());
       //ConnectionTesting Configuration  
 dataSource.setConnectionTesterClassName(ReplicationAwareConnectionTester.class.getCanonicalName());
         dataSource.setTestConnectionOnCheckin(true);
        dataSource.setTestConnectionOnCheckout(false);
        dataSource.setPreferredTestQuery("/* PING */ SELECT 1");
        dataSource.setIdleConnectionTestPeriod(60); 

Here it is my ReplicationAwareConnectionTester

public class ReplicationAwareConnectionTester extends AbstractConnectionTester {

@Autowired
 private AbstractConnectionTester baseTester;

    @Override
    public int activeCheckConnection(Connection connection, String preferredTestQuery, Throwable[] rootCauseOutParamHolder) {
        // If we succeed, the connection should remain in readOnly mode. So we check the master first.
        final boolean readOnlyValues[] = { false, true };
        for (boolean isReadOnly : readOnlyValues) {
            try {
                connection.setReadOnly(isReadOnly);
                int result = baseTester.activeCheckConnection(connection, preferredTestQuery, rootCauseOutParamHolder);
                if (result != CONNECTION_IS_OKAY) {
                    return result;
                }
            } catch (Throwable e) {
                // Per ConnectionTester contract, we return e in throwables
                if (rootCauseOutParamHolder != null && rootCauseOutParamHolder.length >= 1) {
                    rootCauseOutParamHolder[0] = e;
                }
                return CONNECTION_IS_INVALID;
            }
        }
        return CONNECTION_IS_OKAY;
    }

    @Override
    public int statusOnException(Connection connection, Throwable throwable, String s, Throwable[] throwables) {
        return baseTester.statusOnException(connection, throwable, s, throwables);
    }
}

Could you please suggest me what should be the my connection testing strategy in case of ReplicationAware Connection.

swaldman commented 6 years ago

I guess I'd say, first, turn all Connection testing off and make sure the problem goes away.

Next, I'd say your ReplicationAwareConnectionTester may well be your problem. If the database is read-only only when the master is down, then your test will fail when read-only goes to true in your for loop. If the DB requires read-only to be set when the master is down, I'd set read only to true always for your test. Tests typically only check reads anyway.

I guess I'd be a lot simpler about a ConnectionTester. Something like...

import com.mchange.v2.c3p0.implDefaultConnectionTester;

public final class ReadOnlyConnectionTester extends AbstractConnectionTester {
   ConnectionTester inner = new DefaultConnectionTester();

   @Override
   public int activeCheckConnection(Connection c, String preferredTestQuery, Throwable[] rootCauseOutParamHolder) {
      c.setReadOnly( true );
      return inner.activeCheckConnection( c, preferredTestQuery, rootCauseOutParamHolder);
   }

   @Override
   public abstract int statusOnException(Connection c, Throwable t, String preferredTestQuery, Throwable[] rootCauseOutParamHolder) {
      c.setReadOnly( true );
      return inner.statusOnException( c, t, preferredTestQuery, rootCauseOutParamHolder);
   }
}

I'm just typing into a web page. I'm sure I've made typos a compiler would catch. But you get the idea.

Please upgrade from 0.9.5.1 to 0.9.5.2. It'll put a lot less stress on your garbage collector.

bhardwajvikas commented 6 years ago

Thanks for the reply @swaldman . I tried the solution as suggested by you. From the last couple of days it was working until today when master database again goes down today. Again we got same error.

Changes done by me

   public int activeCheckConnection(Connection c, String preferredTestQuery, Throwable[] rootCauseOutParamHolder) {
      c.setReadOnly( true );
      return inner.activeCheckConnection( c, preferredTestQuery, rootCauseOutParamHolder);
   }

We haven't add c.readOnly(true); to statusOnException method in hope that we need to do only activeCheckConnection.

 @Override
   public abstract int statusOnException(Connection c, Throwable t, String preferredTestQuery, Throwable[] rootCauseOutParamHolder) {
      return inner.statusOnException( c, t, preferredTestQuery, rootCauseOutParamHolder);
   }
bhardwajvikas commented 6 years ago

We are facing continuous issue with ReplicationAware Connection & C3P0. When master down, we are not able to serve read traffic also. It looks like C3P0 is not supporting failover when master goes down on ReplicationAware Connection. And its not C3P0 only, other connection pools (HikariCP) also not support these type of failovers. https://github.com/brettwooldridge/HikariCP/issues/625

Please correct me if I am wrong.

swaldman commented 6 years ago

I am not sure why it wouldn't work, if (and only if) your application calls setReadOnly(true). If read-only Connections are the norm, you might consider implementing a ConnectionCustomizer and calling setReadOnly(true) in onAcquire(...). Then Connections will be delivered by the pool in read-only mode by default. Your application would have to call setReadOnly(false) to try to write (and it would only work when the master is available).

But I may well be missing something that Brett is not missing!

bhardwajvikas commented 6 years ago

We have already created ConnectionCustomizer which set readOnly flag to true on OnAcquire and OnCheckout. But the problem is , before reaching to this code path, C3P0 throws connection timeout exception.

Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:118)
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:77)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:690)
        at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)
        at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:71)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
        ... 45 more
Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@2ce3d95a -- timeout at awaitAvailable()
        at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1467)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
        at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685)
        ... 48 more
swaldman commented 6 years ago

What happens if you unset or set to zero c3p0.checkoutTimeout?