not an sstable (bad magic number)

We have found a scenario where we can create not an sstable (bad magic number) errors consistently on Linux while wrapping the library in the JNA api. We believe we are using the LevelDB api correctly, though if you see an error in this test code, we'd love to know that's what's causing it.

We have experienced errors on every Linux we have tried. It seems to happen faster on virtualized Linux, but we've also run into errors while running Linux directly on metal. The magic sauce to getting a faster time to failure is setting the -Xloggc option on the JVM to a location that does not exist. We believe this is creating enough latency in the JVM to exacerbate a problem that is always present.

The following will complete (though, as we said, given enough time we believe it will still see corruption errors):

java -server -jar build/libs/leveldbtest-all.jar

The following will fail with corruption without a change in the code:

java -server -Xloggc:/does/not/exist -jar build/libs/leveldbtest-all.jar

With this issue report we are providing a gradle project and scripts for a test harness that is able to recreate this error in seconds by running on Docker (by far the worst time to failure we have found). While running in other configurations the time to failure is measured in hours (running on Azure or AWS) or days (-Xloggc pointing to a location that exists), but we are confident in saying that it's not a matter of if, but when the error will occur.

We would be more than happy to test any patches or answer your questions. We've spent weeks working on narrowing down this problem to its simplest example as it is critical to us that it is fixed.

Versions used for the test

Docker: 1.12.0
Java: OpenJDK 1.8u102 (though we see this with Oracle's Java as well)
JNA: 4.1.0
LevelDB: 1.19
Test error log

As you can see in the log excerpt below, the time to failure for this test harness was less than a second (sometimes it takes several seconds).

2016-09-14 15:30:38.162 [main                          ] INFO  leveldb.test.LevelDbBlockStoreHarness: ---> Starting LevelDbBlockStoreHarness --->
2016-09-14 15:30:38.170 [main                          ] INFO  leveldb.test.BlockDatabase: opening database: data/648960378880486400
2016-09-14 15:30:38.275 [main                          ] INFO  leveldb.test.leveldb.LevelDBJNA: LevelDB version 1.19 loaded
2016-09-14 15:30:38.304 [LevelDbBlockStoreHarness-1    ] INFO  leveldb.test.LevelDbBlockStoreHarness: Starting thread 1 of 1
2016-09-14 15:30:38.313 [LevelDbBlockStoreHarness-1    ] INFO  leveldb.test.LevelDbBlockStoreHarness: Starting run 1 of 10 with database, BlockDatabase[dbDirectory = data/648960378880486400, totalBlockBytes = 0]
2016-09-14 15:30:39.034 [LevelDbBlockStoreHarness-1    ] ERROR leveldb.test.LevelDbBlockStoreHarness: Catching
 leveldb.test.leveldb.api.CorruptionException: Corruption: not an sstable (bad magic number)
    at leveldb.test.leveldb.api.Util.checkError(Util.java:20) ~[leveldbtest-all.jar:?]
    at leveldb.test.leveldb.api.DB.write(DB.java:42) ~[leveldbtest-all.jar:?]
    at leveldb.test.BlockDatabase.purgeIfFull(BlockDatabase.java:90) ~[leveldbtest-all.jar:?]
    at leveldb.test.BlockDatabase.saveBlocks(BlockDatabase.java:68) ~[leveldbtest-all.jar:?]
    at leveldb.test.LevelDbBlockStoreHarness$1.run(LevelDbBlockStoreHarness.java:74) [leveldbtest-all.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]

Running the test

Install Docker
Unpack the provided zip file
Execute leveldbtest/bin/run_docker.sh
What the test does

This test simply creates and removes random byte blobs from the database. The blobs are randomly sized, and written in random length batches. When the total bytes in the database exceeds 10M the blobs are deleted and the process continues.

The main method is in leveldb.test.LevelDbBlockStoreHarness. At the top of the file are a number of knobs you can turn to modify the execution of the test harness if you feel like during your testing.

// Number of databases to use concurrently
private static final int NUMBER_OF_DATABASES = 1;
// Number of threads to run concurrently
private static final int NUMBER_OF_THREADS = 1;
// Number of runs per thread
private static final int NUMBER_OF_RUNS = 10;
// Number of packages per run
private static final int NUMBER_OF_PACKAGES = 100;
// Minimum number of blocks per package (never set below 1)
private static final int MINIMUM_NUMBER_OF_BLOCKS = 1;
// Maximum number of blocks per package
private static final int MAXIMUM_NUMBER_OF_BLOCKS = 100;
// Minimum number of bytes per block (never set below 1)
private static final int MINIMUM_BLOCK_SIZE = 64 * 1024;
// Maximum number of bytes per block
private static final int MAXIMUM_BLOCK_SIZE = 256 * 1024;

Notes

During our attempts to diagnose the error we recompiled LevelDB to run without using memory-mapped files. This had no effect on our ability to recreate this error.
We tested with multiple version of Ubuntu (12.04, 14.04 and 16.04) and with Oracle's Java and Open-JDK with no change in the ability to recreate this error beyond changing the mean time to failure.
Test Harness

sstable_corruption_test.zip

I performed a few tests using the supplied test harness, and have some additional information to provide:

I was able to reproduce the failure running a Linux guest within a VirtualBox VM (i.e. guest running Linux, but the host was running FreeBSD 10.3)
- The problem only happens while running on a Linux guest (Centos7, Mint18).
- CentOS used xfs filesystem, Mint used ext4
- The problem never happens when running a FreeBSD (FreeBSD running as a VM Guest, or running on bare iron).
- Running on FreeBSD does require building a custom version of leveldb
The failure appears to be affected by the number of cores, indicating a race condition:
- The problem never happened while running on a VM with just a single core assigned
- The problem never happened on a multi-core VM, when taskset was used to bind the JVM process to a single core
- The problem did happen on a multi-core VM, when taskset was used to bind the process to 2/4 available processor cores
This happens (on multi-core Linux) when running on either 32 and 64-bit JVMs
- 32-bit requires building a custom version of leveldb
This happens (on multi-core Linux) regardless of the GC strategy was used
- Parallel/Serial/G1
This happens (on multi-core Linux, never on FreeBSD) even after rebulding a custom leveldb library
- Always built tag v19 (commit id=3080a45)
- Building with g++ or clang++ compilers yields same result
- Building with -O2 vs -O0 optimization yields same result
- Disabling -DLEVELDB_ATOMIC_PRESENT has no effect

i have a problem similar like this. but my program only have one worker thread, you can see it as a single thread program, and sometimes the Put operator will fail, and the leveldb's log show:

2017/05/26-15:00:44.947831 7f62a505d7c0 Delete type=3 #1
2017/05/26-15:00:44.948561 7f62937ff700 Level-0 table #5: started
2017/05/26-15:00:44.948622 7f62937ff700 Level-0 table #5: 0 bytes OK
2017/05/26-15:00:44.990261 7f62937ff700 Delete type=0 #3
2017/05/26-15:00:44.990352 7f62937ff700 Manual compaction at level-0 from (begin) .. (end); will stop at (end)
2017/05/27-07:58:16.065847 7f62937ff700 Level-0 table #7: started
2017/05/27-07:58:16.116611 7f62937ff700 Level-0 table #7: 3081599 bytes OK
2017/05/27-07:58:16.122961 7f62937ff700 Delete type=0 #4
2017/05/27-14:02:05.646892 7f62937ff700 Level-0 table #9: started
2017/05/27-14:02:05.691227 7f62937ff700 Level-0 table #9: 3188392 bytes OK
2017/05/27-14:02:05.697950 7f62937ff700 Delete type=0 #6
2017/05/27-14:29:49.691848 7f62937ff700 Compacting 1@1 + 1@2 files
2017/05/27-14:29:49.707433 7f62937ff700 Generated table #10: 377 keys, 6152 bytes
2017/05/27-14:29:49.707472 7f62937ff700 Compacted 1@1 + 1@2 files => 6152 bytes
2017/05/27-14:29:49.712960 7f62937ff700 compacted to: files[ 0 0 1 0 0 0 0 ]
2017/05/27-14:29:49.713244 7f62937ff700 Delete type=2 #9
2017/05/27-14:29:49.713990 7f62937ff700 Delete type=2 #7
2017/05/27-19:03:13.651491 7f62937ff700 Level-0 table #12: started
2017/05/27-19:03:13.696402 7f62937ff700 Level-0 table #12: 3240067 bytes Corruption: not an sstable (bad magic number)

CentOS 7.2 64bit, multi core VM, GCC 4.4.7 LevelDB version is 1.18, now i upgrade it to 1.20 (not test yet)

how can i solve this problem? thanks a lot @cmumford

HI All, there are anyway to fix this issue? I am facing this on OpenWRT. I can re-open the database if I remove *.log file.

One of the LevelDB co-creators, ghemawat, appears to be aware of this problem. I don't know if he knows how many people are being affected by it. Many of us in the STORJ project seem to be affected by it. Recommend reaching out to him. (see issue below)

https://github.com/google/leveldb/issues/474

Does somebody have a stand-alone C++ test that demonstrates the problem?

On Tue, Aug 14, 2018 at 10:21 AM Hypocritus notifications@github.com wrote:

One of the LevelDB co-creators, ghemawat, is aware of this problem. I don't know if he knows how many people are being affected by it. Many of us in the STORJ project seem to be affected by it. Recommend reaching out to him. (see issue below)

474 https://github.com/google/leveldb/issues/474

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/leveldb/issues/408#issuecomment-412949812, or mute the thread https://github.com/notifications/unsubscribe-auth/AHuM0cvKshZEvPqKOIJ-L67U9oMplDWOks5uQwcjgaJpZM4J9KnP .

google / leveldb