jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.87k stars 872 forks source link

db.getStore().compact() hangs? #998

Open Karl483r opened 2 years ago

Karl483r commented 2 years ago

Hi,

thanks for the library!

I'm trying to add roughly 100 billion strings to a database. I create a database and treeset and call compact roughly every 300 million strings: db = DBMaker .fileDB(file) .fileMmapEnable() .allocateIncrement(512 * 1024 * 1024) .make(); NavigableSet<String>treeSet = (NavigableSet<String>) db .treeSet("treeSet_" + name+"_"+k) .serializer(new SerializerCompressionWrapper(Serializer.STRING)) .createOrOpen(); This worked fine, but after inserting roughly 2^32 strings the next call of db.getStore().compact(); seemed to "hang". With hanging I mean the CPU utilization is almost zero and there is also no disk activity.

After that I thought maybe I'm only allowed to add 2^32 (max int) strings and changed my code to create 15 databasefiles, each with 10 tree sets. That worked for some time but now that I have added roughly 70 billion strings a OutOfMemoryException was thrown while compacting. After that I killed the program and started again from my backup, but added a 250GB swapfile to the 64GB RAM. This time the compact call seems to hang again (no error/exception). At least for the last 4 hours there was no disk or CPU activity. I'm using mapdb 3.0.8.

Is there a limit to how many strings I can add? Or am I doing something else wrong?

Edit: For each databasefile I have a own thread. The call to compact() is locked by a semaphore, so only one compact() call is executed at a time.

Karl483r commented 2 years ago

Tried again and got a stack trace: java.lang.OutOfMemoryError: Java heap space at org.mapdb.serializer.SerializerByteArrayNoSize.deserialize(SerializerByteArrayNoSize.java:23) at org.mapdb.serializer.SerializerByteArrayNoSize.deserialize(SerializerByteArrayNoSize.java:14) at org.mapdb.StoreDirectAbstract.deserialize(StoreDirectAbstract.kt:229) at org.mapdb.StoreDirect.get(StoreDirect.kt:546) at org.mapdb.StoreDirect.compact(StoreDirect.kt:771) at Utils.Compact.compact(Compact.java:56) at Utils.Compact.call(Compact.java:40) at Utils.Compact.call(Compact.java:16) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:830) Forgot to update Xmx value after adding the swap. Will update when I know more.

jankotek commented 2 years ago

hi, compaction loads some data into memory. That is causing OOEMs. There is no workaround for that, sorry

Karl483r commented 2 years ago

Hi, ok thanks for the reply. With high Xmx value I get the following exception: Exception in thread "main" java.util.concurrent.ExecutionException: org.mapdb.DBException$PointerChecksumBroken: Broken bit parity at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) at MainParallel4.add(MainParallel4.java:153) at MainParallel4.main(MainParallel4.java:67)

Caused by: org.mapdb.DBException$PointerChecksumBroken: Broken bit parity at org.mapdb.DataIO.parity1Get(DataIO.java:440) at org.mapdb.StoreDirect.getIndexVal(StoreDirect.kt:131) at org.mapdb.StoreDirect.get(StoreDirect.kt:523) at org.mapdb.BTreeMap.getNode(BTreeMap.kt:800) at org.mapdb.BTreeMap.access$getNode(BTreeMap.kt:72) at org.mapdb.BTreeMap$BTreeIterator.advanceFrom(BTreeMap.kt:1031) at org.mapdb.BTreeMap$BTreeIterator.advance(BTreeMap.kt:1050) at org.mapdb.BTreeMap$keyIterator$1.next(BTreeMap.kt:1205) at org.mapdb.BTreeMap.sizeLong(BTreeMap.kt:908) at org.mapdb.BTreeMap.getSize(BTreeMap.kt:898) at org.mapdb.BTreeMap.size(BTreeMap.kt:72) at org.mapdb.BTreeMapJava$KeySet.size(BTreeMapJava.java:407) at Utils.AddPWNew.call(AddPWNew.java:78) at Utils.AddPWNew.call(AddPWNew.java:16) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:830)

Is the "Broken bit parity" a consequence of me adding too much data or am I doing something else wrong?