olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.87k stars 546 forks source link

Query term separator not in sync with tokenizer separator #493

Open CMSeb opened 3 years ago

CMSeb commented 3 years ago

As I understand there are two settings to define how strings are split up into terms. The tokenizer separator, which is used when indexing documents and the term separator to split up search terms. Those two should be equal since version 2.0.2 (https://github.com/olivernn/lunr.js/commit/356d25a98c5d67548c9c65799e151cc5b6e524d3)

But it seems that when you try to change/update the lunr.tokenizer.separator the internally used separator lunr.QueryLexer.termSeparator is not updated as well.

import lunr from "lunr";

const createSearchIndex = () => {
  return lunr(function() {
    this.tokenizer.separator = /[\s]+/; // only spaces (default is: spaces and hyphen /[\s\-]+/ )
    //...
  });
};

createSearchIndex();
// lunr.tokenizer.separator contains: /[\s]+/
console.log('lunr tokenizer separator', lunr.tokenizer.separator.toString()); 
// lunr.QueryLexer.termSeparator still contains: /[\s\-]+/
console.log('lunr query lexer term separator', lunr.QueryLexer.termSeparator.toString());

The current workaround is to explicitly set the internal attribute lunr.QueryLexer.termSeparator, which is far from ideal, don't you think?