olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.87k stars 546 forks source link

Cannot find ROLLS-ROYCE by *rolls* #527

Open klebann opened 1 year ago

klebann commented 1 year ago

Search term: *roll*:

image

Search term: *rolls*:

image

Data (search by name): { "id": 3681, "name": "TROLLER", "url": "/tecdoc/engine/list/3681" } { "id": 705, "name": "ROLLS-ROYCE", "url": "/tecdoc/engine/list/705" }

Is it normal behaviour? Why it can search for *roll* but not for *rolls* ?

MeaningOfLights commented 9 months ago

I'm having a similar issue, I can't search names with a space or hyphenated. Can you please allow "Rolls*Royce" to work? Ideally we should be able to do "Rolls Royce" and do an exact phrase search with the enclosed quotes. Its a massive limitation single word searches.

michael-aka-mmh commented 8 months ago

In https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lib/tokenizer.js#L76 one can see that the tokenizer uses any whitespace or dash character to separate words. You could try to change the regular expression to whitespace characters only.

Exact phrase search is on my wish list, too.

MeaningOfLights commented 1 week ago

@michael-aka-mmh thanks I ended up doing a few things. Strip the HTML tags before indexing, using a technique to highlight words I check if there are direct (not fuzzy) matches and a few other things. I had better results changing what I fed it than changing the separator regex.