ilovesoup / hyracks

Automatically exported from code.google.com/p/hyracks
Apache License 2.0
0 stars 0 forks source link

Hyracks fixes for Asterix issue 113 in hyracks_fix_asterix_issue_113 #66

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Purpose of code changes on this branch:
I've fixed Asterix issue 113.

The root cause of the issue was the lazy index creation in hyracks.
Consider the following Asterix jobs (in pseudo-AQL):
Script 1:
- drop/create/use dataverse
- create dataset foo;
- create index bar on foo(x);
Script 2:
- drop/create/use dataverse
- create dataset foo;
- create index bar on foo(x);

The create index job in Script 2 would scan dataset foo and load a non-empty 
index bar! The file backing dataset foo was still present from Script 1, and 
therefore bar would be loaded, causing subsequent failures (inserting duplicate 
keys and such).
Note that the drop operator currently cannot simply delete the file due to 
issue 117.

The fix:
I've fixed the issue by adding a create index operator, which is issued for 
every create dataset and create index AQL statement to ensure a clean state.
From now on, issuing such an index creation job is mandatory before doing any 
operation on an index through hyracks operators.
If a non-create operator finds that its backing file has not been created yet, 
it will throw an exception.

Please have a look at branch hyracks_fix_asterix_issue_113 revisions
r1457, r1461, r1462, r1463

When reviewing my code changes, please focus on:
Correctness of the lifecycle of index creation/dropping etc.
Keep in mind the following cases (and their combinations):
1. What if the backing file already exists?
2. What if the backing file does not yet exist?
3. How do I know that I am reading a valid index file (and not some garbage 
that has gotten there by arbitrary means)

I hope you'll find that having a create index job is the safest solution for 
now, to deal with all those cases.

After the review, I'll merge this branch into:
hyracks_asterix_stabilization

Original issue reported on code.google.com by alexande...@gmail.com on 10 May 2012 at 10:39

GoogleCodeExporter commented 9 years ago
Added my comments (also included r1464).

Let's have a quick back and forth on the comments/questions I left before 
merging back.

My main concern is that I don't see how/when index files are ever deleted.

Original comment by zheilb...@gmail.com on 11 May 2012 at 10:03

GoogleCodeExporter commented 9 years ago
Correct. Implementing a proper drop is blocked by issues 117 and 118. I'd 
suggest we consider them separate issues.

Original comment by alexande...@gmail.com on 12 May 2012 at 12:08

GoogleCodeExporter commented 9 years ago
Please have another look at r1467, and let me know if I'm good to merge!

Original comment by alexande...@gmail.com on 12 May 2012 at 1:31

GoogleCodeExporter commented 9 years ago

Original comment by alexande...@gmail.com on 15 May 2012 at 1:55