Open GoogleCodeExporter opened 9 years ago
Original comment by vinay...@gmail.com
on 23 Oct 2013 at 7:28
Check if the disjunction works and whether or not it uses the index.
Original comment by zheilb...@gmail.com
on 29 Oct 2013 at 11:48
Original comment by zheilb...@gmail.com
on 8 Nov 2013 at 6:26
Tested with disjunction and does not work. Only works if there is a single
predicate.
For clarity, this query will not pick the inverted index:
use dataverse azure;
for $t in dataset Tweets
where contains($t.text, "#xboxkinect") or contains($t.text, "xbox") or
contains($t.text, "#xbox") or contains($t.text, "xbox360") or contains($t.text,
"#xbox360") or contains($t.text, "xboxone") or contains($t.text, "#xboxone")
group by $b := interval-bin($t.created_at, datetime("2013-01-06T00:00:00"),
day-time-duration("P7D")) with $t
return {"b": $b, "count": count($t)}
This query WILL pick the inverted index:
use dataverse azure;
for $t in dataset Tweets
where contains($t.text, "#xboxkinect")
group by $b := interval-bin($t.created_at, datetime("2013-01-06T00:00:00"),
day-time-duration("P7D")) with $t
return {"b": $b, "count": count($t)}
Original comment by zheilb...@gmail.com
on 8 Nov 2013 at 6:36
Does this query pickup a regular B-Tree index, if one was defined over $t.text
field?
for $t in dataset Tweets
where contains($t.text, "#xboxkinect")
return $t
From what I remember, the use of contains(...) or any other function in the
WHERE clause did not lead to an index look up. You may want to verify if that
is still the case or if that was fixed recently.
Original comment by khfaraaz82
on 8 Nov 2013 at 7:05
Original comment by zheilb...@gmail.com
on 15 Nov 2013 at 8:31
Original comment by vinay...@gmail.com
on 15 Nov 2013 at 8:33
We have more basic problems with in-lists and disjunctive queries as well.
Original comment by dtab...@gmail.com
on 18 Feb 2014 at 4:56
Just a note:
I'm working on a rewriting for eq-predicates that translates disjunctions to
joins.
So the equivalent query for this case would be:
for $t in dataset Tweets
for $w in ["#xboxkinect", "xbox", "#xbox", "xbox360", "#xbox360", "xboxone",
"#xboxone"]
where contains($t.text, $w)
group by $b := interval-bin($t.created_at, datetime("2013-01-06T00:00:00"),
day-time-duration("P7D")) with $t
return {"b": $b, "count": count($t)}
However,
a) this wouldn't work for this case as we would introduce duplicates (every
test that contains "xbox360" also contains "xbox") and
b) we need to be sure that the join would pick the index correctly (as it does
for eq).
Original comment by westm...@gmail.com
on 1 May 2014 at 6:26
discussion results:
- existential quantification and disjunction can both be rewritten into join
- for predicates that introduce duplicates like contains we need duplicate
elimination on the outer (probe) side of the join
- the rewriting should only be done if an index is available (for INL join) or
if we have a bulk-join operation for contains (like fuzzy joins)
- if an index is available we also have key that can be used for duplicate
elimination
for a simple case the rewriting for existential quantification would look like
this
for $x in dataset Tweets
where some $t in ["...", "..."] satisfies contains($x.t, $t)
...
->
for $x in dataset Tweets
for $t in ["...", "..."]
where contains($x.t, $t)
distinct by $x.key
...
Original comment by westm...@gmail.com
on 16 May 2014 at 4:36
Original issue reported on code.google.com by
vinay...@gmail.com
on 23 Oct 2013 at 7:27