lwhay / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Key Duplication Error in re-begin feed #371

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
step1. 

create dataverse demo;
use dataverse demo;

create type TweetType as closed {
 id: string,
 username : string,
 location : string,
 text : string,
 timestamp : string
}       

create feed dataset BostonTweets(TweetType)
using pull_twitter (("query"="Boston"),("interval"="10"))
primary key id;

step2.
use dataverse demo;
begin feed BostonTweets;
(wait 1minute to get feed data)

step3.
use dataverse demo;
end feed BostonTweets;

step4.
use dataverse demo;
begin feed BostonTweets;

What is the expected output? What do you see instead?
From cc.log
------------
uting: edu.uci.ics.hyracks.control.cc.work.JobStartWork@3f965e77
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.cc.scheduler.ActivityClusterPlanner 
planActivityCluster
INFO: Plan for edu.uci.ics.hyracks.api.job.ActivityCluster@2a884ab9
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.cc.scheduler.ActivityClusterPlanner 
planActivityCluster
INFO: Built 1 Task Clusters
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.cc.scheduler.ActivityClusterPlanner 
planActivityCluster
INFO: Tasks: [TID:ANID:ODID:1:0:0, TID:ANID:ODID:2:0:0]
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: 
edu.uci.ics.hyracks.control.cc.work.WaitForJobCompletionWork@5b1e31c0
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: 
PartitionRequest@[JID:6:CDID:2:0:0:a1_node1:TAID:TID:ANID:ODID:1:0:0:0:STARTED]
Apr 20, 2013 1:31:03 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: 
PartitionAvailable@[JID:6:CDID:2:0:0:a1_node1:TAID:TID:ANID:ODID:2:0:0:0non-reus
able STARTED]
Apr 20, 2013 1:31:45 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskFailureEvent[JID:6:TAID:TID:ANID:ODID:1:0:0:0:Exception 
caught by thread: 
edu.uci.ics.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:1:0:0:
0:0
edu.uci.ics.hyracks.api.exceptions.HyracksDataException: 
edu.uci.ics.hyracks.api.exceptions.HyracksDataException: 
edu.uci.ics.hyracks.storage.am.btree.exceptions.BTreeDuplicateKeyException: 
Failed to insert key since key already exists.
    at edu.uci.ics.hyracks.control.nc.Task.pushFrames(Task.java:335)
    at edu.uci.ics.hyracks.control.nc.Task.run(Task.java:268)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
Caused by: edu.uci.ics.hyracks.api.exceptions.HyracksDataException: 
edu.uci.ics.hyracks.storage.am.btree.exceptions.BTreeDuplicateKeyException: 
Failed to insert key since key already exists.
    at edu.uci.ics.hyracks.storage.am.lsm.common.dataflow.LSMIndexInsertUpdateDeleteOperatorNodePushable.nextFrame(LSMIndexInsertUpdateDeleteOperatorNodePushable.java:96)
    at edu.uci.ics.hyracks.control.nc.Task.pushFrames(Task.java:319)
    ... 4 more
Caused by: 
edu.uci.ics.hyracks.storage.am.btree.exceptions.BTreeDuplicateKeyException: 
Failed to insert key since key already exists.
    at edu.uci.ics.hyracks.storage.am.lsm.btree.impls.LSMBTree.insert(LSMBTree.java:283)
    at edu.uci.ics.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:260)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:128)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:119)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.tryInsert(LSMTreeIndexAccessor.java:67)
    at edu.uci.ics.hyracks.storage.am.lsm.common.dataflow.LSMIndexInsertUpdateDeleteOperatorNodePushable.nextFrame(LSMIndexInsertUpdateDeleteOperatorNodePushable.java:57)
    ... 5 more

]
Apr 20, 2013 1:31:45 AM 
edu.uci.ics.hyracks.control.cc.partitions.PartitionMatchMaker 
removeUncommittedPartitions
INFO: Removing uncommitted partitions: []
Apr 20, 2013 1:31:45 AM 
edu.uci.ics.hyracks.control.cc.partitions.PartitionMatchMaker 
removePartitionRequests
INFO: Removing partition requests: []
Apr 20, 2013 1:31:45 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: edu.uci.ics.hyracks.control.cc.work.JobCleanupWork@455d9b0e

Please use labels and text to provide additional information.

Original issue reported on code.google.com by kiss...@gmail.com on 20 Apr 2013 at 8:37

GoogleCodeExporter commented 9 years ago
This is a post-beta defect.
Managing the QoS for feeds is one of the major open issues I see as making them 
not ready for prime time right now.
Feeds need to be able to fail externally and be resumed from where they left 
off, as well as similar behavior when system management operations occur.
Raman (et al) should form a list of desirable feed properties and their 
implications.
In a perfect world, a Twitter feed would capture each Tweet exactly once - not 
less, not more - regardless of failures anywhere between Twitter and an 
AsterixDB dataset.

Original comment by dtab...@gmail.com on 20 Apr 2013 at 3:22