Closed meltuhamy closed 6 years ago
It seems that if the search includes a * at the beginning of the query, the search hangs.
idx.search('*Zlah*') // hangs
idx.search('Zlah*') // ok
On my machine that search eventually completes after 12 seconds, something is clearly not right though.
The test data looks fairly pathological. Out of interest, how did you discover it?
My guess is that the leading wildcard is causing many iterations through the graph that represents the token, I'll need to take a closer look with a debugger to see where its getting stuck though.
I'm using wildcards to allow substring searches. If there are a lot of repeated items in the result, it will take longer and longer to perform the search. For now, I'm using my own pipeline function to expand all tokens into their substrings so the user never needs to use a wildcard character (I'm using lunr in the context of a loose auto complete search scenario).
Searches with leading wildcards are considerably more expensive, that said, I wouldn't expect it to take this long. I'll have a look through the relevant code but its a bit involved and I wrote it over a year ago now.
Perhaps there is a better way of implementing what you are trying to achieve though, can you give an example of the kind of documents you are searching within, and the kind of query you are trying? Is string in the document multiple tokens merged together into a string? If I understood what you're trying to achieve better I might be able to suggest a more performant way of implement it.
I'm doing a search auto complete feature where it finds all documents which have a substring of the search term. e.g.
the words
hello world
blah
word
nice
dice
and the search term d
should get the results
hello world
word
dice
because d
is a substring of the two reuslts.
To begin with, I was able to achieve this using the wildcard method (i.e. search for *d*
) but found the issue above and performance problems.
I'm open to ideas as to what's a better way of doing it, but for now I have a working solution which is a pipeline function to expand all terms to their subsets.
If you have a solution that is working then thats great. I'll still spend a bit of time trying to understand the particular case you originally posted, perhaps there is some optimisation or bug causing the long run time.
Yes it seems that is quite a nasty bug that is preventing me from using wildcards. Thanks for your help!
@meltuhamy how exactly did you do the workaround with the pipeline function? I'm afraid that expanding every keyword to subsets would make my index burst.
I currently had to disable leading wildcards because of this issue (>20s with leading wildcard compared to <400ms without).
I've pushed 2.3.3 which should resolve the long search times with some wildcard searches, let me know if it solves the issue, thanks.
Confirmed this fixes it. Thanks @olivernn !
See the jsFiddle here: https://jsfiddle.net/bwe0f44g/