meriems / luke

Automatically exported from code.google.com/p/luke
0 stars 0 forks source link

ComplexPhraseQueryParser/Enumerating Wildcard Queries #9

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We are troubleshooting performance issues related to wildcard queries 
running against the ComplexPhraseQueryParser (introduced in Lucene 3.0). 
Some wildcard queries like "quality operations providing quality food 
services job requirements: click here to apply for this job*" require a 
maxClauseCount > 40K to match against the DBSight implementation that 
we're using (and run longer than 2 seconds too).

We'd like to understand what the 40K clauses are as we don't think we have 
nearly that many unique words that have job* as a prefix. 

Can Luke support the ComplexPhraseQueryParser or otherwise show the fully 
enumerated query for wildcard searches? 

Original issue reported on code.google.com by auspici...@gmail.com on 1 Feb 2010 at 8:20

GoogleCodeExporter commented 9 years ago
There is no specific support for this - however, if you suspect that the last 
term
with the widlcard is the source of the problem you could perhaps use just this 
term
as a simple wildcard query, click the "Update" button, and see the Rewritten 
list of
clauses.

Original comment by sig...@gmail.com on 2 Feb 2010 at 11:05

GoogleCodeExporter commented 9 years ago
Thanks but the simple WildCard query "job*" is rewritten as 
ConstantScore(TEXT:job*) 
which doesn't reveal the list of clauses.

Original comment by auspici...@gmail.com on 2 Feb 2010 at 12:39

GoogleCodeExporter commented 9 years ago

Original comment by sig...@gmail.com on 31 Mar 2010 at 4:35