lwhay / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

When loading csv/adm file, there isn't data integrity check under PK constraint. #719

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I defined dataset, type as below.

create type keywordsType as open {raw-keyword: string};

create dataset keywordSet(keywordsType)
primary key raw-keyword;

After that,
I loaded some duplicated keywords to see how to process it.
I couldn't see any messages about data integrity from AsterixDB.

I'll attach simple csv file and AQL sentences to test easily.
The data is made of just one column, 'document-id + keyword'.

------------------------------------------------------------------
In addition,
when I execute equi join('='), just one record is returned.
(although there exist duplicated keywords)

for $d in dataset keywordSet
where $d.raw-keyword = "2013021401 AsterixDB"
return $d;
------------------------------------------------------------------
When I execute 'delete' AQL using equi join('='), all duplicated keywords are 
deleted.

delete $d from dataset keywordSet
where $d.raw-keyword = "2013021401 AsterixDB";
------------------------------------------------------------------
When I execute 'contains' clause, of course, all duplicated keywords are 
returned.

for $d in dataset keywordSet
where contains($d.raw-keyword, "2013021401 AsterixDB")
return $d;
------------------------------------------------------------------

You can see it all if you download attach file.

Original issue reported on code.google.com by kiyoung....@gmail.com on 14 Feb 2014 at 10:14

Attachments:

GoogleCodeExporter commented 9 years ago
I just tried the example that you provided and I'm getting this exception 
(which is the correct and expected behavior):

Input stream given to BTree bulk load has duplicates. 
[TreeIndexDuplicateKeyException]

Which version of asterix are using?

Original comment by salsuba...@gmail.com on 15 Feb 2014 at 4:29

GoogleCodeExporter commented 9 years ago
The asterix version is 0.8.0.
I just followed AsterixDB homepage 'Single-Machine installation
instruction'.

Original comment by kiyoung....@gmail.com on 19 Feb 2014 at 5:25

GoogleCodeExporter commented 9 years ago
You are using the oldest version of AsterixDB which was released almost 8 
months ago :-) There have been ton of fixes since then.

Please use the lastest version of AsterixDB, which has a fix for this issue:
http://asterix.ics.uci.edu/download.html

Original comment by salsuba...@gmail.com on 19 Feb 2014 at 5:36

GoogleCodeExporter commented 9 years ago
Sound good.
I'll try again at newest version.
Thank you so much :)

Original comment by kiyoung....@gmail.com on 19 Feb 2014 at 8:56

GoogleCodeExporter commented 9 years ago
I downloaded and installed the latest version.
I also could see the error message 'Input stream given to BTree bulk load has 
duplicates. [TreeIndexDuplicateKeyException]'

Thank you. (^.^)

Original comment by kiyoung....@gmail.com on 20 Feb 2014 at 9:22