One of the areas I got stuck on when debugging the trigram-question-mark issue,
but might actually be a fundamental design limitation / feature, is that moving
to prefix/suffix lists can cause the list of trigrams to drop considerably.
bash$ ./csearch -verbose 'foo_(bar)?zot' >/dev/null
2012/03/07 22:18:48 query: "foo" "oo_" "zot" ("_zo" "o_z")|("arz" "rzo")
2012/03/07 22:18:48 post query identified 0 possible files
bash$ ./csearch -verbose 'foo_(bar_)?zot' >/dev/null
2012/03/07 22:18:53 query: "foo" "oo_" "zot"
2012/03/07 22:18:53 post query identified 0 possible files
In the first case, "bar" is only three characters and stays as an exact trigram
and is used to construct the arz/rzo entries. When it becomes a prefix/suffix
list (when it hits 4 characters by adding the underscore), it no longer
provides us with any trigram info because the empty string empties out the
prefix and suffix lists as being "redundant" with the empty string. ("" is a
prefix of "ba").
I'm not sure if this is a bug or not. I.e, _should_ we be able to transform
prefix/suffix lists into AND/OR sets of trigrams in this case?
Original issue reported on code.google.com by dgryski on 7 Mar 2012 at 10:08
Original issue reported on code.google.com by
dgryski
on 7 Mar 2012 at 10:08