playframework / anorm

The Anorm database library
https://playframework.github.io/anorm/
Apache License 2.0
240 stars 75 forks source link

akka-streams module ends up loading the whole ResultSet into memory #516

Closed AlexITC closed 1 year ago

AlexITC commented 1 year ago

Are you looking for help?

No.

Anorm Version (2.5.x / etc)

2.6.10

Operating System (Ubuntu 15.10 / MacOS 10.10 / Windows 10)

Ubuntu 20.04 x86_64

JDK (Oracle 1.8.0_72, OpenJDK 1.8.x, Azul Zing)

openjdk version "11.0.16" 2022-07-19 LTS
OpenJDK Runtime Environment Corretto-11.0.16.8.1 (build 11.0.16+8-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.16.8.1 (build 11.0.16+8-LTS, mixed mode)

Library Dependencies

//> using scala "2.13"
//> using lib "org.playframework.anorm::anorm:2.6.10"
//> using lib "org.playframework.anorm::anorm-akka:2.6.10"
//> using lib "com.typesafe.akka::akka-stream:2.6.19"
//> using lib "com.typesafe.akka::akka-actor:2.6.19"
//> using lib "org.postgresql:postgresql:42.3.6"

Also, postgresql 15.

Expected Behavior

There shouldn't be a huge memory consumption when streaming the result from a huge table.

Actual Behavior

The whole ResultSet seems to be loaded into memory, I have already tried to throttle the stream, set a small buffer, still, there is no improvement.

Reproducible Test Case

The reproduction example can be executed by scala-cli with scala-cli AnormAkkaStreamsBroken.scala, a postgres server needs to be running, the code includes instructions to launch it with docker.

AnormAkkaStreamsBroken.scala

//> using scala "2.13"
//> using lib "org.playframework.anorm::anorm:2.6.10"
//> using lib "org.playframework.anorm::anorm-akka:2.6.10"
//> using lib "com.typesafe.akka::akka-stream:2.6.19"
//> using lib "com.typesafe.akka::akka-actor:2.6.19"
//> using lib "org.postgresql:postgresql:42.3.6"

import akka.actor.ActorSystem
import akka.stream.{ActorMaterializer, Materializer}
import anorm.AkkaStream._
import anorm._

import java.sql.{Connection, DriverManager}
import scala.concurrent.Await
import scala.concurrent.duration.DurationInt

/*
In order to run this example, you will need a postgres database running, docker is a way to get this easily with:
- docker run -p 5555:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=anorm_db -it postgres

Then, create the required table:
- docker run -it --network=host -e PGPASSWORD=postgres --rm postgres psql -h 127.0.0.1 -p 5555 -U postgres anorm_db -c "create table sample_table(id BIGINT NOT NULL PRIMARY KEY);"
 */
object AnormAkkaStreamsBroken extends App {

  val dbUser = "postgres"
  val dbPassword = "postgres"
  val dbName = "anorm_db"
  val dbHost = "localhost"
  val dbPort = 5555
  val jdbcUrl = s"jdbc:postgresql://$dbHost:$dbPort/$dbName?user=$dbUser&password=$dbPassword"

  implicit val system = ActorSystem("sample-system")
  implicit val mat: Materializer = ActorMaterializer()

  try {

    println("Getting database connection")
    implicit val conn: Connection = DriverManager.getConnection(jdbcUrl)

    println("Make sure table has a huge amount of data")
    val rowsToAdd: Long = 100000000
    val max: Long = SQL"""select max(id) from sample_table""".as(SqlParser.scalar[Long].singleOpt).getOrElse(0)
    if (max < rowsToAdd) {
      println("Adding lots of items to the table")
      SQL"""insert into sample_table
          select generate_series(${max + 1},${rowsToAdd}) AS id
       """.executeUpdate()
    }

    println("Process the stream, watch your memory!")
    val f = source(SQL"""select id from sample_table""", SqlParser.scalar[Long])
      .runFold(0) { (acc, _) => acc + 1 }
    val count = Await.result(f, 15.minutes)
    println(s"Count: $count")
  } finally {
    sys.exit(0)
  }
}

Demo:

Screencast from 16-11-22 07:04:27.webm

cchantep commented 1 year ago

Try reading using direct ResultSet (the stage is not accumulating data, just wrapping Result). If the result is the same ...

AlexITC commented 1 year ago

Try reading using direct ResultSet

I'll try and update the ticket later,

the stage is not accumulating data, just wrapping Result

Yes, I figured this out while checking the source, which is why I mentioned that the problem seems to came from jdbc, I don't know if there is a way to prevent this.

cchantep commented 1 year ago

Checking the jdbc driver option/workarounds, other do see anything at Anorm level.

AlexITC commented 1 year ago

After putting sometime into this, I checked the official postgres-jdbc-docs which specify that by default, the whole result set is loaded into memory but there is a simple way to avoid that (setFetchSize), gladly, SimpleSql has a method available to set such a value (withFetchSize) while autoCommit must be false. After applying those changes, the stream behaves the way I expected.

Thanks.

cchantep commented 1 year ago

Thanks for the feedback.