xiaoxichen / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

DBTest.HiddenValuesAreRemoved fails intermittently #182

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

run tests for leveldb 1.11.0 repeatedly

What is the expected output? What do you see instead?

Some tests fail intermittently. We've seen:

==== Test CorruptionTest.CompactionInputErrorParanoid
./db/corruption_test.cc:325: failed: 1 == 5
FAIL: corruption_test

==== Test DBTest.HiddenValuesAreRemoved
./db/db_test.cc:1202: failed: [ tiny, <long string here> ] == [ tiny ]
FAIL: db_test

What version of the product are you using? On what operating system?

Fedora 16 x86_64
snappy-1.0.5-1.fc16.x86_64
gcc-4.6.3-2.fc16.x86_64

Please provide any additional information below.

I can also test on Fedora 17 if that would help.

Original issue reported on code.google.com by fullung@gmail.com on 27 Jun 2013 at 8:07

GoogleCodeExporter commented 9 years ago
issue 87 also talks about the CorruptionTest failing in the same way, but only 
in a VM running CentOS, and possibly ubuntu on ARM. Are you running fedora in a 
VM or on metal? Do you have a rough estimate of how many runs are necessary 
until you see this issue?

I have not seen the issue reproduce on my ubuntu precise machine, after ~800 
runs of corruption_test. Though I can possibly try on an ARM CrOS machine later.

Testing on Fedora 17 could be helpful if it's not too much trouble. If it 
doesn't reproduce there we can begin to narrow down differences.

Could you follow up about the CompactionInputErrorParanoid bug in issue 87? We 
can use this (issue 182) for the HiddenValuesAreRemoved bug, which I have not 
seen a report for until now.

Original comment by dgrogan@chromium.org on 27 Jun 2013 at 5:45

GoogleCodeExporter commented 9 years ago
This is on metal. 2* 6 core Xeon. We saw multiple failures in 10 runs.

I'll test on Fedora 17 tomorrow. That's on a Core i7 though.

Original comment by fullung@gmail.com on 27 Jun 2013 at 6:31

GoogleCodeExporter commented 9 years ago
I suspect that there's a race condition in the test harness.  If level 0 is not 
compacted before the check it will fail.

We fixed this for our environment by adding sleep to the tests where 
appropriate.  A better fix would be to add some synchronization to wait for 
compaction to finish.

Here's our code for reference (line 325/326):  
https://github.com/rescrv/HyperLevelDB/blob/master/db/corruption_test.cc

I didn't consider upstreaming these changes until now because they only 
manifested themselves after we separated the memtable compaction into another 
thread.

Original comment by res...@gmail.com on 27 Jun 2013 at 7:14

GoogleCodeExporter commented 9 years ago
Okay, it seems the CompactionInputErrorParanoid puzzle is solved, so I'll focus 
on gathering more information about HiddenValuesAreRemoved here.

Original comment by fullung@gmail.com on 28 Jun 2013 at 4:06

GoogleCodeExporter commented 9 years ago
Correction: turns out we also saw this test failure inside a VM.

All I can guess is that the VM changes up the timing in the tests.

Is there anything we can do to provide more information to debug this one?

Original comment by fullung@gmail.com on 28 Jun 2013 at 2:09

GoogleCodeExporter commented 9 years ago
I have been able to reproduce this problem and I am working on a fix.  Thanks 
for the reports.

Original comment by san...@google.com on 1 Jul 2013 at 9:31