lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.64k stars 133 forks source link

How to prevent treating terms separately? #244

Closed PuneetKohli closed 7 months ago

PuneetKohli commented 7 months ago

Hey, so if we give an input such as "Software In" and expect it to return "Software Intern" and "Software Infrastructure Engineer" from search, how do we prevent Minisearch from treating "In" as a term separately, and returning terms that starts with just "In" and doesn't have the word "software" in it?

Is there a flag to always use all the terms?

lucaong commented 7 months ago

Hi @PuneetKohli , yes, you can achieve what you want by using the search option combineWith: 'AND'. This will require all query terms to be present in the results (retrieving documents that contain both "software" and "in").

const documents = [
  { id: 1, text: "Software development" },
  { id: 2, text: "Software Infrastructure" },
  { id: 3, text: "Software Intern" },
  { id: 4, text: "Software developed by an Indian company" },
  { id: 5, text: "Hardware Engineering" }
]

const miniSearch = new MiniSearch({
  fields: ['text']
})

miniSearch.addAll(documents)

miniSearch.search("software in", { combineWith: "AND", prefix: true })
// => will return documents 2, 3 and 4

Note that you will also get results where the query terms are not contiguous, such as "Software developed by an Indian company". That's because MiniSearch does not take the term position into account (this is by design, as doing so would result in much larger indexes, against the goal of MiniSearch to fit in the browser memory, also on constrained devices).

If you need to perform a phrase search (matching only contiguous terms), it is possible to filter only documents where the query terms appear contiguously by doing something like:

const miniSearch = new MiniSearch({
  fields: ['text'],
  storeFields: ['text']
})

// Assume the same documents as the previous example
miniSearch.addAll(documents)

const query = "software in"

miniSearch.search(query, {
  combineWith: "AND",
  prefix: true,
  filter: (result) => result.text.toLowerCase().includes(query.toLowerCase())
})
// => will return documents 2 and 3

I hope this helps :)

lucaong commented 7 months ago

@PuneetKohli I think your question was answered, so I am going to close the issue. If you need any more help though, feel free to comment further on this issue.