lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.64k stars 133 forks source link

Include Unicode zero-width spaces in the tokenizer regexp #250

Closed lucaong closed 6 months ago

lucaong commented 6 months ago

Even though Unicode lists them in the "Control, Format" category and not in the "Separator, Space" one, it makes sense to consider them spaces.

See: https://github.com/lucaong/minisearch/issues/249