seancorfield / next-jdbc

A modern low-level Clojure wrapper for JDBC-based access to databases.
https://cljdoc.org/d/com.github.seancorfield/next.jdbc/
Eclipse Public License 1.0
747 stars 89 forks source link

Connection failure with Hikary and c3p0 since migration from Korma. #281

Open jeans11 opened 1 week ago

jeans11 commented 1 week ago

Describe the bug

Sometimes one of our services got connection error like Communication link failure with Hikary and Connections could not be acquired from the underlying database! with c3p0.

This happen since we move to next.jdbc from Korma. We have the last version of the new jdbc driver.

Stacktraces If applicable, add stack traces to help explain your problem.

Environment (please complete the following information):

Additional context

This behaviour occurs mainly when we deploy a new version of application. I really don't understand why. We don't have touch any config of the MySQL. The impacted service is not exposed to the end user. He executes select/insert each specific interval.

The actual connection pool is c3p0 with the following invocation:

(defn ->c3p0-pool
  [config]
  (let [jdbc-params (select-keys config [:useSSL :dbname :dbtype :host :user :password :allowLoadLocalInfile])
        pool-params (select-keys config [:maxIdleTimeExcessConnections :maxIdleTime])]
    (connection/->pool
     ComboPooledDataSource
     (assoc pool-params
            :jdbcUrl (connection/jdbc-url jdbc-params)))))

We keep the same c3p0 as Korma.

We are completely lost.

seancorfield commented 1 week ago

Korma explicitly sets a lot of properties for c3p0: https://github.com/korma/Korma/blob/master/src/korma/db.clj#L54-L95

You'll want to double-check all of those are c3p0's defaults (since they are defaulted in Korma's code rather than omitting the properties).

If you are certain all the properties match, then I can't think of any reason why the behavior would be different.

You say it seems to happen mostly as you're rolling out a new app version -- does it all settle down and work without errors after that (as the pools stabilize)?

I assume that your application has changed in several ways if you've dropped Korma and replaced all those calls with your own code and calls to next.jdbc? Is it possible you're reusing a stale connection, or perhaps overrunning the connection pool, or some other new behavior that wasn't there in your old code?

The only time we see Communications Link Failure errors at work (where we use HikariCP and both next.jdbc and clojure.java.jdbc) is when there's genuinely a network glitch and we lose connectivity with the database (so the active connections used by our code will break, independent of the pool itself).

jeans11 commented 1 week ago

Hi! Sean.

Thank you to answer me!

You'll want to double-check all of those are c3p0's defaults (since they are defaulted in Korma's code rather than omitting the properties).

I check multiple times:

(type conn)
;; => #<Class@408cfbc3 com.mchange.v2.c3p0.ComboPooledDataSource>
(.getMaxIdleTimeExcessConnections conn)
;; => 1800
(.getMaxIdleTime conn)
;; => 10800
(.getInitialPoolSize conn)
;; => 3
(.getMinPoolSize conn)
;; => 3
(.getMaxPoolSize conn)
;; => 15
(.getIdleConnectionTestPeriod conn)
;; => 0
(.isTestConnectionOnCheckin conn)
;; => false
(.isTestConnectionOnCheckout conn)
;; => false
(.getPreferredTestQuery conn)
;; => nil

You say it seems to happen mostly as you're rolling out a new app version -- does it all settle down and work without errors after that (as the pools stabilize)?

When I deploy a new version or when the app scale up, I got this error sometimes. I need to force a restart of the app in order to avoid the error and get the app running request again.

I assume that your application has changed in several ways if you've dropped Korma and replaced all those calls with your own code and calls to next.jdbc? Is it possible you're reusing a stale connection, or perhaps overrunning the connection pool, or some other new behavior that wasn't there in your old code?

In fact, the part of the application that fail has not changed because that use hugsql. I have just set the next jdbc adapter.

seancorfield commented 1 week ago

Then I have no idea, sorry.

Like I say, we use HikariCP/next.jdbc/MySQL (Percona 5.7) very heavily at work, with the 8.4.0 driver, and the only time we see those errors is when there is genuinely a database connection outage -- and the system usually recovers pretty quickly from that.

Given how thin a wrapper next.jdbc is around JDBC, I don't see how anything in next.jdbc would cause this. I will note that we do not use HugSQL so maybe the next.jdbc adapter there is something for you to investigate?

jeans11 commented 1 week ago

Thank Sean!

What is your Hikari configuration? (Notably on the timeout params)

seancorfield commented 1 week ago

This is everything we set:

  (let [jdbc-url (str "jdbc:mysql://"
                      host ":" port "/" db-name
                      "?useSSL=false&characterEncoding=UTF-8"
                      "&connectionTimeZone=SERVER"
                      (when (= 6033 port) ; not in dev/CI
                        "&serverTimezone=UTC"))
        ;; the HikariCP docs suggest using default settings as much
        ;; as possible and letting it manage liveness/connection testing:
        cpds    (doto (HikariDataSource.)
                  (.setPoolName dsn-name)
                  (.setJdbcUrl jdbc-url)
                  (.setDriverClassName "com.mysql.cj.jdbc.Driver")
                  (.setUsername user)
                  (.setPassword password)
                  ;; from the docs:
                  ;; We strongly recommend setting this value, and it
                  ;; should be several seconds shorter than any database
                  ;; or infrastructure imposed connection time limit.
                  ;; 20 minutes (the default 30) -- since we seem to
                  ;; see communication link timeouts at 30 minutes:
                  (.setMaxLifetime (* 20 60 1000)))]
    (when max-pool (.setMaximumPoolSize cpds max-pool))
    {:datasource cpds})

We use a localhost-based Proxy MySQL instance running on every server, hence the useSSL=false.