Closed Fauntleroy closed 2 months ago
I managed to get a working solution for my use case by doing the following in extractField
:
extractField(document, fieldName){
...
return document?.options.map(({ name }) => name).join(' ');
...
}
This returns an array of string value as a space-separated string, which the tokenize
function then splits into multiple tokens. I'm not sure if this is ideal, or intended, but it works in my case.
Hi @Fauntleroy ,
If I understand your problem well, there is another solution, which I consider simpler and more manageable: use a custom processTerm
function to expand terms into their synonyms upon indexing, but do not perform any synonym expansion upon search.
const miniSearch = new MiniSearch({
fields: [/* your fields here... */],
// Use your custom processTerm upon indexing:
processTerm: customProcessTerm,
searchOptions: {
// Use the default processTerm upon search:
processTerm: MiniSearch.getDefault('processTerm')
}
})
Say that you have two documents:
const documents = [
{
id: 1,
text: "cherry 1/8th oz"
},
{
id: 2,
text: "apple 1/8th oz"
}
]
Assume your custom processTerm
expands the term "8th"
into ["8th", "eighth"]
and "1"
into ["1", "one"]
. Upon indexing, the text of the first document will be tokenized and expanded to the terms ["cherry", "1", "one", "8th", "eighth", "oz"]
, while the text of the second document will be tokenized and expanded to ["apple", "1", "one", "8th", "eighth", "oz"]
.
Upon searching for "cherry eighth"
, the query will be tokenized to the terms ["cherry", "eighth"]
(with no synonym expansion), therefore searching with combineWith: "AND"
will match the first document (and only that one). The same result is obtained when searching for "cherry 1/8th"
(which is tokenized to ["cherry", "1", "8th"]
, again with no expansion).
Note that, since extractField
is called only upon indexing and not upon search, your solution manages to achieve the same end result, but in a way that I would consider a bit "hacky" and less clear in its intent.
@lucaong It looks like you understand what I was trying to do pretty well! This is exactly the same solution I ended up going with, in fact
I'm building a client side search that needs to handle measurements given in both metric and imperial. This means I'm using
processTerm
to normalize things like"1/8th oz"
to["eighth", "1/8th", "3.5g"]
. UsingcombineWith: "OR"
, this approach works just fine.However, searches will need to be combinatorial, and include things like "cherry eighth". This means I'd want to use
combineWith: "AND"
to only include results with both tokens. Naturally, when the extra tokens are included for each normalized measurement, this filters everything out.Is there a way to use
combineWith: "AND"
conditionally, only for certain sets of tokens/terms?