lwhay / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Incorrect query results possible when accessing secondary indexes #723

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The problem is a theoretical problem. When searching a secondary index, it's 
possible that we:
(1) Forward a particular record <SK, PK> from the 2ndary index search operator
(2) In the meantime, that record is deleted and reinserted as <SK', PK>. This 
is possible because we do not lock during secondary index searches.
(3) If the newly inserted record is inserted "ahead" of the search cursor, then 
the search cursor will return it twice. It can be inserted ahead for a number 
of reasons (e.g. SK' > SK in a B+-tree or if SK' matches an R-tree predicate 
and lands on a page that the cursor hasn't visited yet).

The primary index is probed for each record returned from the 2ndary index 
search. This means that both <SK, PK> and <SK', PK> are used as probes to the 
primary index (i.e. we retrieve the record with <PK> twice). As long as SK' 
matches the original predicate, then we will return the record twice. This is 
incorrect behavior!

Fortunately, the fix is simple: use a duplicate eliminating sort on PK between 
secondary and primary indexes. We already perform the sort, but it just needs 
to be made to eliminate duplicates.

Original issue reported on code.google.com by zheilb...@gmail.com on 11 Mar 2014 at 10:08