zio / zio-jdbc

A small, idiomatic ZIO interface to JDBC.
Apache License 2.0
82 stars 62 forks source link

Infinite retries when unable to open a Connection #194

Open gaeljw opened 7 months ago

gaeljw commented 7 months ago

Context

I've noticed a strange behavior which I'm attributing to zio-jdbc but it could actually be caused by ZIO core. I don't know enough of the details to be affirmative here.

Versions

"dev.zio" %% "zio" % "2.0.21",
"dev.zio" %% "zio-jdbc" % "0.1.2",

Observed behaviour

When a Connection cannot be opened with the database due to a connection timeout for instance (or any other error blocking long enough, in my reproduction scenario it's 30s but on the real case I observed it was 2 minutes), the retry logic seems to be misbehaving and the application retries infinitely instead of crashing.

Reproduction scenario

Here's a reproduction test using Scalatest:

package com.kk.p.e.services

import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers
import org.slf4j.{Logger, LoggerFactory}
import zio.Unsafe.unsafe
import zio.jdbc.{ZConnectionPool, ZConnectionPoolConfig, stringToSql, transaction}
import zio.{Chunk, RIO, Task, ULayer, ZEnvironment, ZIO, ZLayer}

import java.sql.SQLException

object MyService {

  val live: ZLayer[ZConnectionPool, Nothing, MyService] = ZLayer.fromFunction { pool: ZConnectionPool => new MyService(pool) }

}

class MyService(pool: ZConnectionPool) {

  def getResults(): Task[Chunk[Long]] = {
    _getResults().provideEnvironment(ZEnvironment(pool))
  }

  private def _getResults(): RIO[ZConnectionPool, Chunk[Long]] = transaction[ZConnectionPool, Throwable, Chunk[Long]] {
    stringToSql("""SELECT 1""")
      .query[Long]
      .selectAll
      .orDie
  }

}

object MyConnectionPool {

  private val logger: Logger = LoggerFactory.getLogger(getClass)

  private val poolConfig: ULayer[ZConnectionPoolConfig] = {
    val retryPolicy = ZConnectionPoolConfig.defaultRetryPolicy
      .tapOutput { case (i, d) => ZIO.succeed(logger.info(s"Retrying pool #$i for duration $d")) }
    ZLayer.succeed(ZConnectionPoolConfig.default.copy(minConnections = 0, retryPolicy = retryPolicy))
  }

  private val connectionPool: ZLayer[ZConnectionPoolConfig, Throwable, ZConnectionPool] = {
    ZConnectionPool.make {
      ZIO
        .attempt {
          logger.info("Opening connection...")
          // DriverManager.getConnection(jdbcUrl)
          Thread.sleep(1000 * 30) // Simulate a long operation
          throw new SQLException("BIM")
        }
        .tapError { ex =>
          ZIO.attempt {
            logger.error(s"Failed to create connection: ${ex.getMessage}")
          }.ignore
        }
    }
  }

  val live: ZLayer[Any, Throwable, ZConnectionPool] = poolConfig >>> connectionPool

}

class BoumSpec extends AnyFlatSpec with Matchers {

  it should "work" in {
    val theProgram: ZIO[MyService, Throwable, Chunk[Long]] = ZIO.serviceWithZIO[MyService](_.getResults())

    val services: ZLayer[Any, Throwable, MyService] = ZLayer.make[MyService](MyService.live, MyConnectionPool.live)

    val theProgramWithDeps = theProgram.provide(services)

    val result: Chunk[_] =
      unsafe { implicit u =>
        zio.Runtime.default.unsafe.run(theProgramWithDeps).getOrThrow()
      }

    result.size shouldEqual 5 // should fail
  }
}

When running this test, here's the output:

2024-02-09 17:54:42,561 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:55:12,605 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:55:12,622 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #0 for duration PT0.015S
2024-02-09 17:55:12,639 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:55:42,640 [ZScheduler-Worker-6] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:55:42,653 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #1 for duration PT0.03S
2024-02-09 17:55:42,685 [ZScheduler-Worker-4] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:56:12,686 [ZScheduler-Worker-4] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:56:12,687 [ZScheduler-Worker-4] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #2 for duration PT0.06S
2024-02-09 17:56:12,748 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:56:42,750 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:56:42,763 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #3 for duration PT0.12S
2024-02-09 17:56:42,885 [ZScheduler-Worker-5] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:57:12,887 [ZScheduler-Worker-5] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:57:12,889 [ZScheduler-Worker-5] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #4 for duration PT0.24S
2024-02-09 17:57:13,131 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:57:43,133 [ZScheduler-Worker-7] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:57:43,137 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #5 for duration PT0.48S
2024-02-09 17:57:43,618 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:58:13,620 [ZScheduler-Worker-2] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:58:13,634 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #6 for duration PT0.96S
2024-02-09 17:58:14,596 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:58:44,598 [ZScheduler-Worker-1] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:58:44,611 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #7 for duration PT1.92S
2024-02-09 17:58:46,533 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:59:16,535 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:59:16,536 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #8 for duration PT3.84S
2024-02-09 17:59:20,377 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:59:50,380 [ZScheduler-Worker-6] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:59:50,382 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #9 for duration PT7.68S
2024-02-09 17:59:58,065 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:00:28,067 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:00:28,069 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #10 for duration PT15.36S
2024-02-09 18:00:28,080 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:00:58,082 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:00:58,084 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #0 for duration PT0.015S
2024-02-09 18:00:58,101 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:01:28,103 [ZScheduler-Worker-7] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:01:28,109 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #1 for duration PT0.03S
2024-02-09 18:01:28,140 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:01:58,141 [ZScheduler-Worker-2] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:01:58,142 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #2 for duration PT0.06S
2024-02-09 18:01:58,203 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:02:28,203 [ZScheduler-Worker-1] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:02:28,204 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #3 for duration PT0.12S
2024-02-09 18:02:28,325 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:02:58,326 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:02:58,329 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #4 for duration PT0.24S
2024-02-09 18:02:58,570 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...

You can notice that after reaching the 10 retries defined in the retry policy, it starts again to retry back at iteration 0.

Other

If the "long operation" is not that long (1s instead of 30s for instance), the behaviour is not the same: the app crashes as expected after 10 retries.

gaeljw commented 7 months ago

Adding that using ZIO.attemptBlocking instead ofZIO.attempt when opening the connection does not change the behavior. I thought it could be related but it's not.

gaeljw commented 6 months ago

The piece of code that relates to this in zio-jdbc is https://github.com/zio/zio-jdbc/blob/cfd8a7e03491da207fe6cd81cd999cd96080cec2/core/src/main/scala/zio/jdbc/ZConnectionPool.scala#L115-L122

And the default retry config is supposed to be max 10 times: https://github.com/zio/zio-jdbc/blob/cfd8a7e03491da207fe6cd81cd999cd96080cec2/core/src/main/scala/zio/jdbc/ZConnectionPoolConfig.scala#L35-L36

maxstreese commented 1 month ago

For anyone stumbling across this, I would like to offer some more insights. From my understanding, what is actually happening are not actually infinite retries, but instead the retry policy of the connection pool is applied to every potentially available connection from the pool. To illustrate what I mean consider the below scala-cli script:

#!/usr/bin/env -S scala-cli shebang

//> using dep "dev.zio::zio-jdbc:0.1.2"
//> using dep "dev.zio::zio-streams:2.1.6"
//> using dep "dev.zio::zio:2.1.6"
//> using dep "org.postgresql:postgresql:42.7.3"

import zio.*
import zio.jdbc.*
import zio.stream.*

object App extends ZIOAppDefault:

  val retryPolicy = Schedule.stop.tapOutput(_ => ZIO.logInfo("failed"))

  val connectionPoolConfig =
    ZLayer.succeed:
      ZConnectionPoolConfig
        .default
        .copy(
          minConnections = 0,
          maxConnections = 1,
          retryPolicy = retryPolicy,
        )

  val connectionPool =
    val props = Map("user" -> "postgres", "password" -> "password")
    ZConnectionPool.postgres("postgres", 5432, "postgres", props)

  val query = sql"select count(*) from test".query[Int]

  val logic =
    for
      count <- transaction(query.selectOne).flatMap(ZIO.fromOption)
      _ <- ZIO.logInfo(s"count is $count")
    yield ()

  override def run = logic.provide(connectionPoolConfig, connectionPool)

App.main(args)

This script will log the failed message once. However, if you adjust minConnections to 3 and maxConnections to 4, the failed message will be logged 4 times.

guizmaii commented 1 month ago

I didn't read in details the issue but I fixed something similar some times ago here: https://github.com/zio/zio-jdbc/pull/174

gaeljw commented 1 month ago

I'm pretty sure #174 was already included when I reported the issue. I mean the issue is still there.