jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.87k stars 872 forks source link

Performance of `commit` drops off abruptly for a file DB after 2GB #982

Open harpocrates opened 3 years ago

harpocrates commented 3 years ago

We're using a file DB with transactions enabled, scheduled to call commitat a fixed delay. We tend to experience pretty drastic slowdowns as the DB file gets big. In order to debug this further, I made a synthetic benchmark which suggests that the performance of commit becomes suddenly much worse once the DB file grows beyond 2GB.

The micro benchmark DB setup is:

val db = DBMaker
  .fileDB("test.db")
  .fileMmapEnable()
  .transactionEnable()
  .make()

val tree = db
  .treeMap("journals")
  .keySerializer(new SerializerArrayTuple(Serializer.BYTE_ARRAY, Serializer.LONG))
  .valueSerializer(Serializer.BYTE_ARRAY)
  .createOrOpen()

What gets run:

Complete code (it is a messy Ammonite Scala script, but I can convert it to Java and clean it up if that helps) ```scala import $ivy.`org.mapdb:mapdb:3.0.8` import $ivy.`com.typesafe.akka::akka-actor:2.6.11` import $ivy.`com.typesafe.akka::akka-stream:2.6.11` import org.mapdb.serializer.SerializerArrayTuple import org.mapdb.{DB, DBMaker, Serializer} import akka.actor._ import scala.util.Random import java.util.UUID import java.util.concurrent.atomic.LongAdder import scala.concurrent.duration._ import java.io.PrintWriter import akka.stream._ import akka.stream.scaladsl._ import scala.concurrent.Future import java.nio.file.Files import java.nio.file.Paths implicit val system = ActorSystem() implicit val ec = system.dispatcher val stats = new PrintWriter("stats.csv") val DbFileName = "test.db" val db = DBMaker .fileDB("test.db") .fileMmapEnable() .fileChannelEnable() .transactionEnable() .make() val tree = db .treeMap("journals") .keySerializer( new SerializerArrayTuple( Serializer.BYTE_ARRAY, Serializer.LONG ) ) .valueSerializer(Serializer.BYTE_ARRAY) .createOrOpen() val totalPuts = new LongAdder() val totalGets = new LongAdder() val totalGetsFound = new LongAdder() var putsCum = 0L val WriteParallelism = 4 val ReadParallelism = 4 val writeFlow = Source .unfold(0L)(x => (Some(x -> (x+1)))) .mapAsync(WriteParallelism) { writeIdx => Future{ tree.put( Array[AnyRef]( UUID.randomUUID().toString.getBytes("UTF-8"), Long.box(System.nanoTime()) ), Random.nextString(90).getBytes("UTF-8") ) totalPuts.increment() } } .to(Sink.ignore) val readFlow = Source .unfold(0L)(x => (Some(x -> (x+1)))) .mapAsync(ReadParallelism) { readIdx => Future{ val found = tree.get( Array[AnyRef]( UUID.randomUUID().toString.getBytes("UTF-8"), Long.box(System.nanoTime()) ) ) totalGets.increment() if (found != null) { totalGetsFound.increment() } } } .to(Sink.ignore) writeFlow.run() readFlow.run() var lastNanos = System.nanoTime() system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds) { () => val before = System.nanoTime() db.commit() val commitNs = System.nanoTime() - before val puts = totalPuts.sumThenReset() val gets = totalGets.sumThenReset() val getsFound = totalGetsFound.sumThenReset() val newNanos = System.nanoTime() val batchNs = newNanos - lastNanos val dbSize = Files.size(Paths.get(DbFileName)) lastNanos = newNanos putsCum += puts stats.println(s"$puts,$putsCum,$gets,$dbSize,$commitNs,$batchNs") stats.flush() } ```

Plotting the time commit takes against the total size of test.db, there seems t one a performance cliff at the 2GB mark:

Screen Shot 2021-01-23 at 10 26 34 PM
tianyawenke commented 5 months ago

Is there any update on this issue?