Closed GoogleCodeExporter closed 9 years ago
It seems it is important which function is being used and with which argument:
This query uses index:
for $t in dataset TweetMessages
let $ed := edit-distance($t.message_text, "Blah Blah")
where $ed <= 2
return {
"id" : $t.tweetid,
"message" : $t.message-text
}
But if you replace "Blah Blah", with "Blah" it no longer uses index.
It is also the same for the query that was paste in original bug report:
(It uses index with "blah blah" but not with "blah":
//Index is used here
for $t in dataset TweetMessages
let $ed := edit-distance-check($t.message_text, "blah blah", 2)
where $ed[0]
return {
"id" : $t.tweetid,
"message" : $t.message_text
}
Original comment by pouria.p...@gmail.com
on 14 Mar 2014 at 10:45
Hey Pouria,
I have implemented a new function to solve your problem. Can you change your
query to the following:
for $t in dataset TweetMessages
let $ed := edit-distance-contains($t.message_text, "Blah Blah", 2)
return {
"id" : $t.tweetid,
"message" : $t.message-text
}
This query will give you the results that contains a similar substring to "Blah
Blah".
Original comment by icetin...@gmail.com
on 1 May 2014 at 11:13
Great !
This is awesome
Thanks Inci ...
Original comment by pouria.p...@gmail.com
on 1 May 2014 at 11:17
As you figured out the parameters given to edit distance-related functions
affects the index usage. Basically we have a formula to decide how many ngrams
need to match between the query and the record. We use index if that number (T)
is greater than 0, otherwise we don't use the index (this case is called "panic
case"). We compute T as follows:
T = Number_of_grams_in_query - gram_length * threshold
Now, if we use edit-distance() or edit-distance-check() number of grams in
query (Q) is computed as follows:
Q = Length_of_query_string + gram_length - 1
If we use edit-distance-contains():
Q = Length_of_query_string - gram_length + 1
Based on this formula if T > 0, it will rewrite the query using the inverted
index.
Original comment by icetin...@gmail.com
on 1 May 2014 at 11:35
I am closing this issue; however the existential query that Pouria came up as a
workaround still doesn't work. We will decide what to do about existential
queries in issue 654.
Original comment by icetin...@gmail.com
on 1 May 2014 at 11:39
Original comment by icetin...@gmail.com
on 1 May 2014 at 11:39
This issue was closed by revision be353dd4a54e.
Original comment by kiss...@gmail.com
on 23 May 2014 at 8:09
This issue was closed by revision be353dd4a54e.
Original comment by kiss...@gmail.com
on 9 Jun 2014 at 6:41
Original issue reported on code.google.com by
pouria.p...@gmail.com
on 14 Mar 2014 at 8:46