lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.83k stars 137 forks source link

Is there any way to combine processTerm and combineWith: 'AND'? #278

Closed Fauntleroy closed 2 months ago

Fauntleroy commented 2 months ago

I'm building a client side search that needs to handle measurements given in both metric and imperial. This means I'm using processTerm to normalize things like "1/8th oz" to ["eighth", "1/8th", "3.5g"]. Using combineWith: "OR", this approach works just fine.

However, searches will need to be combinatorial, and include things like "cherry eighth". This means I'd want to use combineWith: "AND" to only include results with both tokens. Naturally, when the extra tokens are included for each normalized measurement, this filters everything out.

Is there a way to use combineWith: "AND" conditionally, only for certain sets of tokens/terms?

Fauntleroy commented 2 months ago

I managed to get a working solution for my use case by doing the following in extractField:

extractField(document, fieldName){
  ...
  return document?.options.map(({ name }) => name).join(' ');
  ...
}

This returns an array of string value as a space-separated string, which the tokenize function then splits into multiple tokens. I'm not sure if this is ideal, or intended, but it works in my case.

lucaong commented 2 months ago

Hi @Fauntleroy , If I understand your problem well, there is another solution, which I consider simpler and more manageable: use a custom processTerm function to expand terms into their synonyms upon indexing, but do not perform any synonym expansion upon search.

const miniSearch = new MiniSearch({
  fields: [/* your fields here... */],
  // Use your custom processTerm upon indexing:
  processTerm: customProcessTerm,
  searchOptions: {
    // Use the default processTerm upon search:
    processTerm: MiniSearch.getDefault('processTerm')
  }
})

Say that you have two documents:

const documents = [
  {
    id: 1,
    text: "cherry 1/8th oz"
  },
  {
    id: 2,
    text: "apple 1/8th oz"
  }
]

Assume your custom processTerm expands the term "8th" into ["8th", "eighth"] and "1" into ["1", "one"]. Upon indexing, the text of the first document will be tokenized and expanded to the terms ["cherry", "1", "one", "8th", "eighth", "oz"], while the text of the second document will be tokenized and expanded to ["apple", "1", "one", "8th", "eighth", "oz"].

Upon searching for "cherry eighth", the query will be tokenized to the terms ["cherry", "eighth"] (with no synonym expansion), therefore searching with combineWith: "AND" will match the first document (and only that one). The same result is obtained when searching for "cherry 1/8th" (which is tokenized to ["cherry", "1", "8th"], again with no expansion).

Note that, since extractField is called only upon indexing and not upon search, your solution manages to achieve the same end result, but in a way that I would consider a bit "hacky" and less clear in its intent.

Fauntleroy commented 2 months ago

@lucaong It looks like you understand what I was trying to do pretty well! This is exactly the same solution I ended up going with, in fact