liuis / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Confusing LevelDB corruption. #197

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Like issue #196, we recently decided to enable paranoid mode to see how good 
LevelDB was actually doing wrt corruption and data integrity.

We found this wacky case of corruption and can't explain it. It appears as if 
two threads raced on adding a record to the log file: one with a short record, 
and one with a long record. The short record wrote, the long record wrote, the 
sort record updated pointer, then the long record updated pointer. It ended up 
looking like some random bytes were inserted, but the rest of the records lined 
up on block boundaries perfectly. When loading it sees the set of zeros (how 
coincidental) and jumps to the next block, which, fortunately, was an end 
record type, and so complained under paranoid mode. The scary part is that if 
the record at the beginning of the next block was a full type, then it would be 
silent data loss, even under paranoid mode.

All of the hex dumps are sequential bytes in the file, partitioned into 
headers, data, and the strange data in the middle of the log.

    // record header. 0x3a bytes type 01
    00000000  a4 36 e6 8e 3a 00 01                              |.6..:..|

    // 0x003a bytes of data
    00000000  e6 48 00 00 00 00 00 00  02 00 00 00 01 14 0a 02  |.H..............|
    00000010  01 16 08 b9 51 38 ba 2c  50 51 d0 39 f5 34 61 6e  |....Q8.,PQ.9.4an|
    00000020  5c 43 00 01 12 0a 03 08  b9 51 38 ba 2c 50 51 d0  |\C.......Q8.,PQ.|
    00000030  39 f5 34 61 6e 5c 43 02  01 16                    |9.4an\C...|

    // tail of some record? those numbers look like a unix epoch time and there are
    // other records in the log with a similar format.
    00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000020  00 00 00 31 33 38 34 31  38 36 37 30 37           |...1384186707|

    // record header. 0x35 bytes type 01
    00000000  d4 13 bd a4 35 00 01                              |....5..|

    // 0x35 bytes of data
    00000000  e9 48 00 00 00 00 00 00  01 00 00 00 01 13 00 02  |.H..............|
    00000010  00 42 e4 8a ea c1 53 f3  e1 e4 6e 74 a4 14 40 da  |.B....S...nt..@.|
    00000020  90 13 08 01 18 00 50 fa  aa c4 89 ef ae b8 02 58  |......P........X|
    00000030  d1 e8 36 60 01                                    |..6`.|

    // record header ...
    00000000  af 4d 0f 4b 20 00 01                              |.M.K ..|

I have no idea how this happened or how to fix it.

Original issue reported on code.google.com by jtolds on 29 Jul 2013 at 3:48

GoogleCodeExporter commented 9 years ago
We're seeing very similar corruption reported, running Bitcoin on OSX: 
https://github.com/bitcoin/bitcoin/issues/2770

Original comment by gavinand...@gmail.com on 12 Aug 2013 at 6:16

GoogleCodeExporter commented 9 years ago
This issue is affecting a ton of the crypto currency clients... bitcoin, 
litecoin, novacoin, worldcoin, on and on and on... all Level DB errors and it 
is for me when I shut the client down and start back up. It is so annoying 
because I have to delete the DB files and re download the whole block chain. 
Also, it is not just OSX I saw at least one person say Windows XP and I am on 
Ubuntu 13.04 DESKTOP ( WITH WINDOWWS TOO) and on Ubuntu SERVER... 

Original comment by sudosurootdev on 12 Aug 2013 at 7:58

GoogleCodeExporter commented 9 years ago
I would recommend to set the priority of this defect to -->"Priority-High"<--. 
People getting unusable/destroyed level-db's because of this issue.

Original comment by jonas.sc...@gmail.com on 13 Aug 2013 at 6:22

GoogleCodeExporter commented 9 years ago
The issue on OS X may be that fsync apparently doesn't tell the hard disk to 
flush to the platters.  There's a separate magic incantation for that. 

Original comment by mh.in.en...@gmail.com on 15 Aug 2013 at 8:06

GoogleCodeExporter commented 9 years ago
I think it would be very helpful for bug reports on corruption to include 
version specifics for both OS and filesystem.  This issue is probably related 
to getting writes flushed to disk properly and the steps necessary to do that 
can be dependent on both the OS and the filesystem.  Leveldb is likely tuned 
very well for the linux stack used at Google, but for other stacks we may need 
to tweak the use of fsync/fdatasync etc -- I think this is what 
port/port_posix.h is intended for.

On Mac OS X for most filesystems, for example, it will probably require using a 
fcntl F_FULLFSYNC, instead of a simply fsync(), in order to guarantee writes 
get to non-volatile storage before returning.  Other OS/fs pairs may require 
other tweaks.  Unfortunately that may significantly degrade performance as 
F_FULLFSYNC will force all buffers to write, including those unrelated to 
leveldb (i.e., I don't believe it is file-specific).  Patch from my local git 
repo is attached.

Original comment by dana.pow...@gmail.com on 15 Aug 2013 at 5:45

Attachments:

GoogleCodeExporter commented 9 years ago
Come to think of it, the originally reported corruption in this ticket was on 
an OS X system as well.

Original comment by jtolds on 15 Aug 2013 at 6:16

GoogleCodeExporter commented 9 years ago
oops, i was just informed that it was actually a linux VM on top of OS X. i'm 
sure the VM stack called the appropriate f_fullfsync, but i don't know for 
certain, and i don't know the specific vm used at this point. :( sorry guys.

Original comment by jtolds on 15 Aug 2013 at 6:21