olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

Feature Request : Use of Tokenizer without indexing #333

Closed kapilgupta77 closed 6 years ago

kapilgupta77 commented 6 years ago

Hi,

Is it possible to just use tokenizer to tokenize a string? Currently, I have to first create an index and then use index.tokenSet.toArray() to get a list of unique words in an array.

Thanks Kapil

olivernn commented 6 years ago

It is possible to use lunr.tokenizer without first creating an index:

var tokens = lunr.tokenizer("some string")

then use index.tokenSet.toArray() to get a list of unique words in an array

lunr.tokenizer does not return a unique list of words though, it just converts a string into tokens, there may be duplicates:

lunr.tokenizer("foo foo foo").length // 3

You could certainly use lunr.TokenSet to get a unique set of the tokens returned by lunr.tokenizer but I'm sure you can think of a simpler way without having to use Lunr.

kapilgupta77 commented 6 years ago

Thank you Oliver.