simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Similarity query needs the total order of two involved sets. #906

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. It's just a query semantic error.
2. We should fix the query itself as well as the function definition depended 
on the two queries

What is the expected output? What do you see instead?
The output is wrong due to the semantic errors of the following queries.
asterix-app/src/test/resources/runtimets/queries/fuzzyjoin/dblp-csx-3_1/dblp-csx
-3_1.3.query.aql
asterix-app/src/test/resources/runtimets/queries/fuzzyjoin/dblp-csx-3_5.1/dblp-c
sx-3_5.1.3.query.aql

What version of the product are you using? On what operating system?
asterix-0.8.7 on mac/linux

Please provide any additional information below.
Bug 1: There are two total ordering computations in stage1 (for example, in 
dblp-csx-3_5.4.3.query.aql line 19-34 and 45-60 need to be combined), but the 
similarity join needs a uniform total-order. We should combine the two 
token-count-order operations into one sort operation on the union of the two 
R-S dataset. 
Bug 2: In the stage1 of the two inner most subquery, them should return sorted 
token set, not the offset $i (in dblp-csx-3_5.4.3.query.aql line 34 and 60).
Bug 3: "at" identification of aql should return a local order among each 
iodevice, but here we need a total order above all the iodevices (in 
dblp-csx-3_5.4.3.query.aql line 34 and 60 should be the token sets, not the 
offsets).

Original issue reported on code.google.com by lwhay...@gmail.com on 10 Jul 2015 at 5:11

Attachments:

GoogleCodeExporter commented 9 years ago
These findings are the results of a discussion Wenhai had with me.  Young-Seok 
provided input about how to the interpret the meaning of "at".  He mentioned 
that different io devices can return the same "at" values, so we need to check 
the correctness of this test query.

A general comment: since Inci has been working on fuzzy queries for a while, we 
need to check if some of these issues have been found and fixed in her branches.

Original comment by che...@gmail.com on 10 Jul 2015 at 7:20