jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.9k stars 872 forks source link

Issues with StoreWAL / TreeMap with 0.9.9 #278

Closed flavor8 closed 10 years ago

flavor8 commented 10 years ago

I keep getting errors like the following after upgrading to 0.9.9; they usually seem to happen after cycling the JVM a couple of times. I am running on Java 8. I'll try Java 7 for a while and see if they reoccur, although it's unlikely to be a cause.

My setup is quite simple: db = DBMaker.newFileDB(dbFile) .closeOnJvmShutdown() .make();

! java.io.IOError: java.io.IOException: Zero Header, data corrupted ! at org.mapdb.SerializerBase.deserialize(SerializerBase.java:825) ! at org.mapdb.SerializerBase.deserialize(SerializerBase.java:811) ! at org.mapdb.BTreeMap$NodeSerializer.deserialize(BTreeMap.java:451) ! at org.mapdb.BTreeMap$NodeSerializer.deserialize(BTreeMap.java:288) ! at org.mapdb.Store.deserialize(Store.java:270) ! at org.mapdb.StoreDirect.get2(StoreDirect.java:468) ! at org.mapdb.StoreWAL.get2(StoreWAL.java:347) ! at org.mapdb.StoreWAL.get(StoreWAL.java:331) ! at org.mapdb.Caches$HashTable.get(Caches.java:230) ! at org.mapdb.BTreeMap.(BTreeMap.java:542) ! at org.mapdb.DB.getTreeMap(DB.java:778)

jankotek commented 10 years ago

How do you handle commits?

flavor8 commented 10 years ago

I commit after every object modification; however, so far, I have not been doing so in a finally block. My understanding was that changes would happen in memory and commit would flush to disk. Is that incorrect?

jankotek commented 10 years ago

Changes are written to Write-Ahead-Log and some meta-data are kept in memory. WAL is basicaly queue of instructions: write this to this offset. On commit WAL is replayed into main file. This protects store from corruption.

Does this happens at runtime, or while database is being opened?

Please try:

1) enable assertions with JVM command line switch -ea. This will catch some bugs early

2) does DBMaker.mmapFileEnable() help?

3) try DBMaker.checksumEnable()

I would also like to know more about our usage:

1) any advanced features (serializers, comparators...)?

2) what is type of key and value? How big are they

3) how many records?

4) how heavy are writes between commits, how many commits per session?

5) Operating system, 32bit?

flavor8 commented 10 years ago

I'll update this as I try alternatives.

This happens the first time the DB is being used. Wasn't caused by Java 8; switched to Java 7 and it still happens. Currently have turned on checksums.

1) No advanced features 2) Not sure which collection this was, but these are simple POJOs (User, Mailbox, Domain, Message). I do have two BigIntegers in Mailbox (RSA public key exponent/modulus)...but I'm not positive it was the Mailbox collection, so that might be a red herring. Will confirm next time this occurs. 3) Less than a dozen 4) Very lightweight, this is just prototyping currently 5) 64 bit, Linux

jankotek commented 10 years ago

Sounds like store corrupts itself. This is usually hard to solve. I will write more stress tests, hopefully I will catch this.

flavor8 commented 10 years ago

This happened again. It didn't involve the collections with BigIntegers, so we can rule that out. Interestingly this time it happened on a TreeSet.

With checksums, the stacktrace is now:

! java.io.IOError: java.io.IOException: Checksum does not match, data broken ! at org.mapdb.StoreWAL.get(StoreWAL.java:333) ! at org.mapdb.Caches$HashTable.get(Caches.java:230) ! at org.mapdb.BTreeMap.(BTreeMap.java:540) ! at org.mapdb.DB.getTreeSet(DB.java:915)

jankotek commented 10 years ago

I just fixed similar (non related) issue in StoreDirect. Hopefully this one will crack soon.

flavor8 commented 10 years ago

Cool.

Just confirmed that DBMaker.mmapFileEnable() does not help either.

flavor8 commented 10 years ago

This might yield a clue. Today I got this error well after the DB had been used for the first time, i.e. while the app was running as opposed to first use.

! java.lang.ArrayIndexOutOfBoundsException: 521 ! at org.mapdb.Volume$ByteBufferVol.getByte(Volume.java:362) ! at org.mapdb.Volume.getUnsignedShort(Volume.java:98) ! at org.mapdb.StoreWAL.longStackGetPage(StoreWAL.java:1054) ! at org.mapdb.StoreWAL.longStackTake(StoreWAL.java:921) ! at org.mapdb.StoreDirect.freePhysTake(StoreDirect.java:1035) ! at org.mapdb.StoreDirect.physAllocate(StoreDirect.java:645) ! at org.mapdb.StoreWAL.update(StoreWAL.java:416) ! at org.mapdb.Caches$HashTable.update(Caches.java:254) ! at org.mapdb.BTreeMap.put2(BTreeMap.java:742) ! at org.mapdb.BTreeMap.put(BTreeMap.java:640)

flavor8 commented 10 years ago

And another...is my filesystem dying or something? I have plenty of space available.

! java.io.IOError: java.io.IOException: File too large ! at org.mapdb.Volume$MappedFileVol.makeNewBuffer(Volume.java:575) ! at org.mapdb.Volume$ByteBufferVol.tryAvailable(Volume.java:300) ! at org.mapdb.Volume.ensureAvailable(Volume.java:58) ! at org.mapdb.StoreWAL.replayLogFile(StoreWAL.java:820) ! at org.mapdb.StoreWAL.commit(StoreWAL.java:620) ! at org.mapdb.EngineWrapper.commit(EngineWrapper.java:92) ! at org.mapdb.DB.commit(DB.java:1497)

jankotek commented 10 years ago

What is underlying filesystem?

I am just half done with extensive code review on critical path, this new stack traces could help.

flavor8 commented 10 years ago

It is ext4.

jankotek commented 10 years ago

I could use some help to reproduce this issue. I tried various stress tests, without luck. Perhaps if could give me a few hints, how my code is different from yours.

This is my current version

public class WALStress {

    final static String chars = "0123456789abcdefghijklmnopqrstuvwxyz !@#$%^&*()_+=-{}[]:\",./<>?|\\";
    public static String randomString(int size) {

        StringBuilder b = new StringBuilder(size);
        Random r = new Random();
        for(int i=0;i<size;i++){
            b.append(chars.charAt(r.nextInt(chars.length())));
        }
        return b.toString();
    }

    public static void main(String[] args) {
        Random r = new Random();
        File f = new File("/media/jan/db1/dbtest/aaa");

//        while(true){
            DB db = DBMaker.newFileDB(f)
                    .mmapFileEnable()
                    .checksumEnable()
                    .syncOnCommitDisable()
                    .make();
            Map map = db.getTreeMap("test");
            int batchSize = r.nextInt(10000);

            for(int i=0;i<batchSize;i++){
                String key = randomString(16);
                String value = randomString(100);
                map.put(key,value);
                if(r.nextInt(100000)<5)
                    map.clear();
                if(r.nextInt(1000)<5)
                    db.commit();
            }
            db.close();
//        }

    }
}

It is called in cycle by this shell script:

#!/usr/bin/fish

while true
    mvn exec:java -Dexec.mainClass="WALStress"
end
flavor8 commented 10 years ago

I'll give this a try and let you know once I have a reproducible case.

flavor8 commented 10 years ago

Ugh, I apologize. This was a wiring issue with my DI framework. I use a DBProvider which injects a DB into my DAOs. I'd forgotten to make it a singleton, so there were multiple DB instances pointing at the same file db.

jankotek commented 10 years ago

Hip Hip Hooray! :-)

On other side file locking should prevent this and throw an exception. Will have to investigate it bit latter.