We're using a file DB with transactions enabled, scheduled to call commitat a fixed delay. We tend to experience pretty drastic slowdowns as the DB file gets big. In order to debug this further, I made a synthetic benchmark which suggests that the performance of commit becomes suddenly much worse once the DB file grows beyond 2GB.
The micro benchmark DB setup is:
val db = DBMaker
.fileDB("test.db")
.fileMmapEnable()
.transactionEnable()
.make()
val tree = db
.treeMap("journals")
.keySerializer(new SerializerArrayTuple(Serializer.BYTE_ARRAY, Serializer.LONG))
.valueSerializer(Serializer.BYTE_ARRAY)
.createOrOpen()
What gets run:
a call to commit scheduled every 5 seconds
lots of concurrent reads/writes (about 4 concurrent reads and 4 concurrent writes at any given moment)
Complete code (it is a messy Ammonite Scala script, but I can convert it to Java and clean it up if that helps)
```scala
import $ivy.`org.mapdb:mapdb:3.0.8`
import $ivy.`com.typesafe.akka::akka-actor:2.6.11`
import $ivy.`com.typesafe.akka::akka-stream:2.6.11`
import org.mapdb.serializer.SerializerArrayTuple
import org.mapdb.{DB, DBMaker, Serializer}
import akka.actor._
import scala.util.Random
import java.util.UUID
import java.util.concurrent.atomic.LongAdder
import scala.concurrent.duration._
import java.io.PrintWriter
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent.Future
import java.nio.file.Files
import java.nio.file.Paths
implicit val system = ActorSystem()
implicit val ec = system.dispatcher
val stats = new PrintWriter("stats.csv")
val DbFileName = "test.db"
val db = DBMaker
.fileDB("test.db")
.fileMmapEnable()
.fileChannelEnable()
.transactionEnable()
.make()
val tree = db
.treeMap("journals")
.keySerializer(
new SerializerArrayTuple(
Serializer.BYTE_ARRAY,
Serializer.LONG
)
)
.valueSerializer(Serializer.BYTE_ARRAY)
.createOrOpen()
val totalPuts = new LongAdder()
val totalGets = new LongAdder()
val totalGetsFound = new LongAdder()
var putsCum = 0L
val WriteParallelism = 4
val ReadParallelism = 4
val writeFlow = Source
.unfold(0L)(x => (Some(x -> (x+1))))
.mapAsync(WriteParallelism) { writeIdx =>
Future{
tree.put(
Array[AnyRef](
UUID.randomUUID().toString.getBytes("UTF-8"),
Long.box(System.nanoTime())
),
Random.nextString(90).getBytes("UTF-8")
)
totalPuts.increment()
}
}
.to(Sink.ignore)
val readFlow = Source
.unfold(0L)(x => (Some(x -> (x+1))))
.mapAsync(ReadParallelism) { readIdx =>
Future{
val found = tree.get(
Array[AnyRef](
UUID.randomUUID().toString.getBytes("UTF-8"),
Long.box(System.nanoTime())
)
)
totalGets.increment()
if (found != null) {
totalGetsFound.increment()
}
}
}
.to(Sink.ignore)
writeFlow.run()
readFlow.run()
var lastNanos = System.nanoTime()
system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds) { () =>
val before = System.nanoTime()
db.commit()
val commitNs = System.nanoTime() - before
val puts = totalPuts.sumThenReset()
val gets = totalGets.sumThenReset()
val getsFound = totalGetsFound.sumThenReset()
val newNanos = System.nanoTime()
val batchNs = newNanos - lastNanos
val dbSize = Files.size(Paths.get(DbFileName))
lastNanos = newNanos
putsCum += puts
stats.println(s"$puts,$putsCum,$gets,$dbSize,$commitNs,$batchNs")
stats.flush()
}
```
Plotting the time commit takes against the total size of test.db, there seems t one a performance cliff at the 2GB mark:
We're using a file DB with transactions enabled, scheduled to call
commit
at a fixed delay. We tend to experience pretty drastic slowdowns as the DB file gets big. In order to debug this further, I made a synthetic benchmark which suggests that the performance ofcommit
becomes suddenly much worse once the DB file grows beyond 2GB.The micro benchmark DB setup is:
What gets run:
commit
scheduled every 5 secondsComplete code (it is a messy Ammonite Scala script, but I can convert it to Java and clean it up if that helps)
```scala import $ivy.`org.mapdb:mapdb:3.0.8` import $ivy.`com.typesafe.akka::akka-actor:2.6.11` import $ivy.`com.typesafe.akka::akka-stream:2.6.11` import org.mapdb.serializer.SerializerArrayTuple import org.mapdb.{DB, DBMaker, Serializer} import akka.actor._ import scala.util.Random import java.util.UUID import java.util.concurrent.atomic.LongAdder import scala.concurrent.duration._ import java.io.PrintWriter import akka.stream._ import akka.stream.scaladsl._ import scala.concurrent.Future import java.nio.file.Files import java.nio.file.Paths implicit val system = ActorSystem() implicit val ec = system.dispatcher val stats = new PrintWriter("stats.csv") val DbFileName = "test.db" val db = DBMaker .fileDB("test.db") .fileMmapEnable() .fileChannelEnable() .transactionEnable() .make() val tree = db .treeMap("journals") .keySerializer( new SerializerArrayTuple( Serializer.BYTE_ARRAY, Serializer.LONG ) ) .valueSerializer(Serializer.BYTE_ARRAY) .createOrOpen() val totalPuts = new LongAdder() val totalGets = new LongAdder() val totalGetsFound = new LongAdder() var putsCum = 0L val WriteParallelism = 4 val ReadParallelism = 4 val writeFlow = Source .unfold(0L)(x => (Some(x -> (x+1)))) .mapAsync(WriteParallelism) { writeIdx => Future{ tree.put( Array[AnyRef]( UUID.randomUUID().toString.getBytes("UTF-8"), Long.box(System.nanoTime()) ), Random.nextString(90).getBytes("UTF-8") ) totalPuts.increment() } } .to(Sink.ignore) val readFlow = Source .unfold(0L)(x => (Some(x -> (x+1)))) .mapAsync(ReadParallelism) { readIdx => Future{ val found = tree.get( Array[AnyRef]( UUID.randomUUID().toString.getBytes("UTF-8"), Long.box(System.nanoTime()) ) ) totalGets.increment() if (found != null) { totalGetsFound.increment() } } } .to(Sink.ignore) writeFlow.run() readFlow.run() var lastNanos = System.nanoTime() system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds) { () => val before = System.nanoTime() db.commit() val commitNs = System.nanoTime() - before val puts = totalPuts.sumThenReset() val gets = totalGets.sumThenReset() val getsFound = totalGetsFound.sumThenReset() val newNanos = System.nanoTime() val batchNs = newNanos - lastNanos val dbSize = Files.size(Paths.get(DbFileName)) lastNanos = newNanos putsCum += puts stats.println(s"$puts,$putsCum,$gets,$dbSize,$commitNs,$batchNs") stats.flush() } ```Plotting the time
commit
takes against the total size oftest.db
, there seems t one a performance cliff at the 2GB mark: