Closed selfawaresoup closed 9 years ago
Agreed. I, for example, also want strings split on underscore and needed to copy the original tokenize function for that. Making the split expression (currently /(?:\s+|\-)/
by default) configurable might be an option.
In the latest (0.6.0) version I've added a property lunr.tokenizer.seperator
that can be overridden to change the regex that is used to split a string into tokens.
Currently, the default tokenizer only splits token on whitespace with
/\s+/
. To use other delimiting characters (e.g. "-") I currently have to set a completely new tokenizer function that mostly a copy of the original one.There should be an option to set the delimiter characters or maybe to pass in a callback that does the splitting.