ilovesoup / hyracks

Automatically exported from code.google.com/p/hyracks
Apache License 2.0
0 stars 0 forks source link

LSM search cursors should not blindly discard anti-matter tuples when performing a merge #92

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Assume that we have an LSM-tree with 3 disk components c1, c2, and c3; where c1 
is the most recent component. Now let's assume there is a matter tuple in c3, 
and anti-matter tuple in c2. When the merge policy choose c1 and c2 to be 
merged, it will only see the anti-matter tuple and it will discard it. This 
will cause the matter tuple in c3 to be a valid tuple, although it has been 
deleted by the user. Currently, we don't see this bug is hidden because our 
merging policy is naive (merge all the components). Whenever we have a more 
advanced merging policy (we already have the mechanisms but the policies are 
not implemented) that pick a subset of the components to be merged, this will 
indeed cause index corruption. 

What is the expected output? What do you see instead?

Currently all lsm indexes use the same search cursor of that index, which 
always discard anti-matter tuples. Cursors should be aware that merges do NOT 
want to discard anti-matter tuples in some scenarios as the one mentioned above.

Original issue reported on code.google.com by salsuba...@gmail.com on 22 Feb 2013 at 1:00

GoogleCodeExporter commented 9 years ago
Could we architect the search/merge process such that we split the process as 
two parts:

1. Open cursors on components as required and load values into pri-Q.
2. Implement a "fold" or "aggregator" process that is initialized once per 
unique key value and then fed tuples one at a time for that key value in 
increasing time order. Finally when it is clear that no more tuples of the same 
value will be available, call a produceResult method on the process that 
produces the result of folding all the values seen so far.

The idea of this split is that the aggregator implementation(s) can be tested 
in isolation without actually having to create a merge test case that shows an 
issue. The logic of the "aggregation" per key should be describable using a 
combine operator <+> such that internally the aggregator keeps the "merge" of 
its current state with the new tuple (in increasing time order) as its state 
for the next merge.

So now all that has to be defined is the <+> binary operator, which probably 
looks like this:

_ <+> M1 = M1
M1 <+> M2 = M2
M1 <+> AM2 = _
_ <+> AM1 = AM1 (this is the case mentioned in the bug)
AM1 <+> M2 = M2
AM1 <+> AM2 = AM2 (for completeness)

where _ indicates nothing, M* is a matter tuple, AM* is an anti-matter tuple.

There are other benefits to abstracting the "merge" in this manner -- It lets 
us implement LSM trees that model any monoid. 
(http://en.wikipedia.org/wiki/Monoid)

Thoughts?

Original comment by vinay...@gmail.com on 22 Feb 2013 at 2:08

GoogleCodeExporter commented 9 years ago
That's one possible approach.

We could have an ITupleAggregator interface and have different implementations 
depending on the purpose of the search. For instance, one implementation is 
MergeAggregator which is exactly what you described.
Another implementation is SearchAggregator which should be used for standard 
search queries. In this implementation, the anti-matter tuples should be always 
discarded. So the <+> binary operator in this implementation should behave 
something like this:

_ <+> M1 = M1
 M1 <+> M2 = M2 (should not occur anyway, since we enforce unique keys)
 M1 <+> AM2 = _ 
_ <+> AM1 = _
AM1 <+> M2 = M2 
AM1 <+> AM2 = _

Original comment by salsuba...@gmail.com on 22 Feb 2013 at 4:21

GoogleCodeExporter commented 9 years ago
M1 <+> M2 can happen when you have arbitrary merge policies.

If you stipulate that AM1 <+> M2 = M2 (which can happen if you are merging a 
subsequence of components that is not a suffix), then at a later time when you 
do look at an older component (that was not used to do the merge), you could 
see an M1 that is <+>'ed with an M2 that was a result of the previous merge.

Original comment by vinay...@gmail.com on 22 Feb 2013 at 4:31

GoogleCodeExporter commented 9 years ago
You are right. 
Hmmm, I wonder if this has any implication somewhere else. Previously, we 
always have a physical guarantee that there are no two matter tuples with the 
same key in the components without having an anti-matter tuple reside in a 
component between them. Now, this could occur physically, although we can still 
deal with it logically inside the SearchAggregator:
M1 <+> M2 = M2

Eventually when the two components are merged we will get rid of the duplicate 
key:
M1 <+> M2 = M2

Original comment by salsuba...@gmail.com on 22 Feb 2013 at 4:42

GoogleCodeExporter commented 9 years ago
Then, we might also want to handle the following case:

Suppose c1, c2, and c3 contain AM3, M2, M1, respectively. Merging c1 and c2
produces: AM3 <+> M2 = _.
Now, we will have two components: c12' and c3 with _ and M1, respectively.
Subsequent searches will return M1, which is incorrect.

One approach to handling this is not to discard anti-matter tuples in a
merge unless the merge involves the oldest component.

Z

Original comment by zheilb...@gmail.com on 22 Feb 2013 at 6:59

GoogleCodeExporter commented 9 years ago
Yep, but it seems fixable if we do this:
M1 <+> AM2 = AM2

Except for the oldest component:
M1 <+> AM2 = _

The thing that I don't like with the approach is that the merge process will 
have special treatment for the oldest component....

Original comment by salsuba...@gmail.com on 22 Feb 2013 at 7:13

GoogleCodeExporter commented 9 years ago
It's not too bad actually... It's easy to detect if the last component
is involved in the merge or not. Based on that, we can just provide
the appropriate "aggregator".

On Feb 21, 2013, at 11:13 PM, "hyracks@googlecode.com"
<hyracks@googlecode.com> wrote:

Original comment by zheilb...@gmail.com on 22 Feb 2013 at 7:19

GoogleCodeExporter commented 9 years ago

Original comment by salsuba...@gmail.com on 24 Sep 2013 at 6:19