olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

Query / Tokenizing help, exact match #338

Closed DominikTrenz closed 6 years ago

DominikTrenz commented 6 years ago

Hey,

i can't find exact matches or combined fields using the query api.

A fiddle: https://jsfiddle.net/wvpcdLaa/16/

Can you help me with that?

hoelzro commented 6 years ago

Hi @DominikTrenz!

From what I see, the reason this isn't working is because you're using the pipeline to index the documents, but not using the pipeline for the search. This means that you're looking for things like M 225 in the inverted index, but only processed tokens like m and 225 exist in that index.

I don't know if lunr.js can handle exact searches across token boundaries - @olivernn do you have any thoughts on that?

DominikTrenz commented 6 years ago

But isn't q.term(searchTerm, { boost: 100 }) // exact match using the pipeline? How can i use the pipeline? Removing usePipeline: false on the other lines changes nothing. The searching across token boundaries would be pretty important.

hoelzro commented 6 years ago

Yes, that first query does use the pipeline - good point! The issue here is that you're treating things like M 225 as a singular token, but lunr.js doesn't index it as a single token. Even if you disabled stemming, you would still have m and 225 as distinct tokens in your inverted index, causing your query never to match.

Let's back up a bit and talk about the higher level: what are you hoping to use lunr.js to do? Are you currently using lunr for search and hope to expand its use, or are you trying it out for a new search feature in an application?

DominikTrenz commented 6 years ago

I'm using lunr for some time and i want to improve my search results. I got documents like in that example. The documents represent rooms with several attributes. Something like: name: "WC" building: "Central Building" room: "M 226" tags: "Toilet, ..."

And people should just be able to find the rooms. Queries like "Toilet central building" should find it.

DominikTrenz commented 6 years ago

I made a custom tokenizer for the room field. So the exact match problem should be gone. But i have no idea how to search across multiple fields

olivernn commented 6 years ago

I've updated the fiddle and now get results for all three queries that I think make sense.

The problem you were seeing is that lunr.Query#term performs no tokenisation of the term, i.e. uses that exact string in the search query. That means in the second query it is searching for the token "M 225". Unless you change the default tokeniser to not split on white space this is never going to match anything.

The lunr.Query interface could possibly make life a bit easier by accepting an array or a single object, and then calling #toString() on that object. That would allow you to do the following:

q.term(lunr.tokenizer("foo bar baz"))

To be fair, the current documentation could make this a bit clearer also.

@DominikTrenz Does that solve your issue?

olivernn commented 6 years ago

I've put together a change that should make this sort of thing a bit easier in the future.

olivernn commented 6 years ago

2.2.0 includes the above mentioned change. I'm going to close this now as I think we've covered everything, feel free to re-open or comment if not.