namhnguyen / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Multiple(specifically 3) duplicated key insertions hang #803

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Insert 3 records whose primary keys are the same.
The third insertion hangs. The jstack output file is attached to show the 
detail location where the execution hangs.  
This occurs due to the incorrect behavior of PrimaryIndexOperationTracker when 
duplicate primary key exception is thrown. 

The detail reason is as follows: 
When the duplicate key exception is thrown and the in-memory component memory 
budget is exceeded(which is tested in the following code snippet:

in AbstractMemoryLSMComponent.java
...
    public void threadExit(LSMOperationType opType, boolean failedOperation, boolean isMutableComponent)
            throws HyracksDataException {
        switch (opType) {
            case FORCE_MODIFICATION:
            case MODIFICATION:
                if (isMutableComponent) {
                    writerCount--;
                    if (state == ComponentState.READABLE_WRITABLE && isFull()) { //<---- isFull() checks the memory budget.
                        state = ComponentState.READABLE_UNWRITABLE;
                    }
                } else {
...

)

the component state becomes readable-unwritable. However, the corresponding 
flush operation(which is triggered in 
PrimaryIndexOperationTracker.completeOperation()) is not triggered when the 
exception is thrown.

One fix can be to call the completeOperation() in order to trigger the flush 
operation so that the in-memory component can take the incoming update 
requests. 
Another fix can be to avoid changing the component state when the threadExit() 
is called if failedOperation is true. Currently, the failedOperation parameter 
is not used at all in the function. 

The following is the complete test case to see the aforementioned situation :

/* run the following aqls in two steps*/

/* step 1 */
drop dataverse STBench if exists;
create dataverse STBench;
use dataverse STBench;

create type SimpleGeoPlaceType as closed {
    coordinates: point, 
    id: int64,             
    name: string, 
    tags: string,
    categories: string,
    phone: string
}
create dataset SimpleGeoPlace (SimpleGeoPlaceType) primary key id;
create index btreeName on SimpleGeoPlace(name) type btree;

insert into dataset SimpleGeoPlace 
{ "coordinates": point("-2.423658,53.0842802"), "id": 5, "name": "20:20 
Mobile", "tags": "mobile", "categories": "Professional Services Computer 
Services", "phone": "" }
;
insert into dataset SimpleGeoPlace 
{ "coordinates": point("-2.423658,53.0842802"), "id": 5, "name": "20:20 
Mobile", "tags": "mobile", "categories": "Professional Services Computer 
Services", "phone": "" }
;

/* step 2: this will hang */
use dataverse STBench;
insert into dataset SimpleGeoPlace 
{ "coordinates": point("-2.423658,53.0842802"), "id": 5, "name": "20:20 
Mobile", "tags": "mobile", "categories": "Professional Services Computer 
Services", "phone": "" }
;

What is the expected output? What do you see instead?
Step 2 should see key duplication exception, but it hangs. 

Original issue reported on code.google.com by kiss...@gmail.com on 29 Sep 2014 at 5:02

Attachments:

GoogleCodeExporter commented 9 years ago
During the course of making changes to fix issue 803, I found another issue 
which can't drop dataverse due to the following exception:

Caused by: edu.uci.ics.hyracks.api.exceptions.HyracksDataException: Cannot 
remove index while it is open.
    at edu.uci.ics.asterix.common.context.DatasetLifecycleManager.unregister(DatasetLifecycleManager.java:111)
    at edu.uci.ics.hyracks.storage.am.common.dataflow.IndexDataflowHelper.destroy(IndexDataflowHelper.java:123)
    at edu.uci.ics.hyracks.storage.am.common.dataflow.IndexDropOperatorNodePushable.initialize(IndexDropOperatorNodePushable.java:49)
    at edu.uci.ics.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:81)
    at edu.uci.ics.hyracks.control.nc.Task.run(Task.java:239)

This issue is caused by the incorrect active operation count management in 
PrimaryIndexOperationTracker when there are duplicate key exceptions.
The count is incremented in beforeOperation() and decremented in 
completeOperation() of PrimaryIndexOperationTracker. When there are exceptions 
such as duplicated key exceptions,  the completeOperation() is not properly 
called. 

This issue seems to have a similar symptom in issue 606 reported by Till. 
But I can't tell exactly since csv file link is not accessible in the reported 
issue.

Original comment by kiss...@gmail.com on 29 Sep 2014 at 9:51

GoogleCodeExporter commented 9 years ago
The incorrect active operation count issue is also related to issue 665. 

Original comment by kiss...@gmail.com on 29 Sep 2014 at 9:55

GoogleCodeExporter commented 9 years ago
I strongly think that the source of the active operation count bug is caused by 
hyracks runtime and NOT by incorrect count management.

I've seen this one before and it was mentioned by other people as well in the 
weekly meeting. 

What I "think" is happening is that when an exception is thrown in an operator 
in the "nextFrame" function, it is expected that a subsequent call to "close" 
will be made while in fact "close" is not being called. Note that all the 
operators I've seen are implemented with this assumption in mind.

This theory can be checked easily.

Original comment by bamou...@gmail.com on 29 Sep 2014 at 10:24

GoogleCodeExporter commented 9 years ago
If you run the test case given in the first issue report, you can see that the 
active operation count in PrimaryIndexOperationTracker doesn't become zero when 
the duplicated key exception is thrown.

Original comment by kiss...@gmail.com on 29 Sep 2014 at 11:23

GoogleCodeExporter commented 9 years ago
Which supports my theory :)

Original comment by bamou...@gmail.com on 29 Sep 2014 at 11:27

GoogleCodeExporter commented 9 years ago

Original comment by kiss...@gmail.com on 30 Sep 2014 at 11:20

GoogleCodeExporter commented 9 years ago
Could you add the revision that fixed this to the issue for future reference?

Original comment by westm...@gmail.com on 17 Oct 2014 at 9:00

GoogleCodeExporter commented 9 years ago
The revisions are shown below:

Asterix revisoin: 
https://code.google.com/p/asterixdb/source/detail?r=1accbc0ee77989470c6330722469
1481b2d23faf
Hyracks revision: 
https://code.google.com/p/hyracks/source/detail?r=b6e23520aa7c590f10110f448dfac9
735786b9e2

Original comment by kiss...@gmail.com on 20 Oct 2014 at 3:41

GoogleCodeExporter commented 9 years ago
Thanks!

Original comment by westm...@gmail.com on 21 Oct 2014 at 5:01

GoogleCodeExporter commented 9 years ago
Issue 665 has been merged into this issue.

Original comment by ildar.absalyamov on 7 Nov 2014 at 7:43