Closed Strat1987 closed 3 years ago
is there a way to customize the lunr.stopWordFilter? https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L1194
The answer of https://github.com/olivernn/lunr.js/issues/408 addresses the challenge I was having with this.
The stopWordFilter can be disabled
const idx = lunr(function() { this.pipeline.remove(lunr.stopWordFilter)
@olivernn Somehow, the stemmer is responsible for changing the "A NAME" value to "A NAM", however following reference would leave the value untouched: http://9ol.es/porter_js_demo.html
Even when performing a full reset of the pipeline
const idx = lunr(function() { this.pipeline.reset()
the index still prints some reference to the stemmer pipeline
index {"version":"2.3.9","fields":["REF","NAME"],"fieldVectors":[["REF/29",[0,0.693]],["NAME/29",[1,0.693]],["REF/31",[2,0.693]],["NAME/31",[3,0.693]]],"invertedIndex":[["a",{"_index":0,"REF":{"29":{}},"NAME":{}}],["a name",{"_index":1,"REF":{},"NAME":{"29":{}}}],["b",{"_index":2,"REF":{"31":{}},"NAME":{}}],["b name",{"_index":3,"REF":{},"NAME":{"31":{}}}]],"pipeline":["stemmer"]}
A search query on "a name" still remains unresolved:
console.log( 'search result for tokenized "a name"', idx.query(q => { q.term(lunr.tokenizer('a name'), {boost: 100, fields: ['NAME']}) }) )
search result for "a name" []
while a similar call for "b name" does return a result: search result for "b name" [ { ref: '31', score: 0.693, matchData: { metadata: [Object: null prototype] } } ]
both are now fields in the invertedIndex
The final piece to this puzzle for me was to also reset the the searchPipeline which was still using the stemmer
this.searchPipeline.reset()
It looks like you managed to solve your own problem, closing this one now, feel free to comment if there is still something that needs clearing up.
We're experiencing unexpected invertedIndex for a specific value it seems:
This gives following output search result "a" [] search result for "b" [ { ref: '31', score: 0.49200000000000005, matchData: { metadata: [Object: null prototype] } } ]
{"version":"2.3.9","fields":["REF","NAME"],"fieldVectors":[["REF/29",[]],["NAME/29",[0,0.693]],["REF/31",[1,0.492]],["NAME/31",[2,0.693]]],"invertedIndex":[["a nam",{"_index":0,"REF":{},"NAME":{"29":{}}}],["b",{"_index":1,"REF":{"31":{}},"NAME":{}}],["b name",{"_index":2,"REF":{},"NAME":{"31":{}}}]],"pipeline":["stemmer"]}
Especially the ""a nam" seems odd in the invertedIndex as well as the lack of the "a" key