olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

Wildcard doesn't match an empty string "" #370

Open Benjamin-Dobell opened 6 years ago

Benjamin-Dobell commented 6 years ago

Wildcard presently does not match an empty (zero length) string, where as it's reasonable to expect that it does.

e.g.

Searching for blue* will not match an indexed document with the text "The sky is blue". However, searching for blue, will correctly return a match.

This is problematic as it typically necessitates two queries where one ought to suffice e.g.

const idSet = new Set()

const words = searchText.trim().split(/ |\-/)
const components = words.filter(w => w.length > 1).map(w => w.toLowerCase())

// We need to do what is *essentially* the same query twice, as Lunr's wildcard support is little unusual 

textIndex.query(query => {
    components.forEach((component, index) => {
        if (index === words.length - 1) {
            query.term(component, {wildcard: lunr.Query.wildcard.TRAILING, presence: lunr.Query.presence.REQUIRED})
        } else {
            query.term(component, {presence: lunr.Query.presence.REQUIRED})
        }
    })
}).forEach(match => idSet.add(Number(match.ref)))

textIndex.query(query => {
    components.forEach((component) => {
        query.term(component, {presence: lunr.Query.presence.REQUIRED})
    })
}).forEach(match => idSet.add(Number(match.ref)))

This also leads to the overhead of having a separate Set to remove duplicate match results.

The problem is somewhat exacerbated by the lack of sub-query support (mentioned in the last comment in https://github.com/olivernn/lunr.js/issues/264). If we had sub-queries (OR/ADD) then the existing wildcard behaviour would be passable as we could just do something like:

query.or(
    query.term(component, {presence: lunr.Query.presence.REQUIRED}),
    query.term(component, {wildcard: lunr.Query.wildcard.TRAILING, presence: lunr.Query.presence.REQUIRED})
)

As it stands, you instead need to perform an entirely separate query, with just the last search term altered, in order to correctly match user input as it's typed.

olivernn commented 6 years ago

Wildcard matching an empty string seems entirely reasonable. The current behaviour is more likely an omission rather than a deliberate choice.

I need to investigate another bug in the code that handles wildcard so I'll take a look at fixing this also.

olivernn commented 6 years ago

Hmm, this might not be a problem with lunr.TokenSet since there is a test specifically for matching zero or more characters.

Majestic7979 commented 1 year ago

Oh gosh this has been an issue since 2019 gasp Bitwarden uses this library, and it's not possible to search for empty usernames on the Bitwarden database. Is there no timeline for fixing? No alternative to search for empty string in a field? Thanks.