lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.81k stars 137 forks source link

Minisearch is not working for special characters #174

Closed diegogmez closed 2 years ago

diegogmez commented 2 years ago

Hello @lucaong,

I am implementeing the minisearch as a search engine, and I am facing some problems with special characters. The JSON strutures that I am using is quite simple {"name": "...", "creator": "...", "modifier": "..."} and one of the field that I need to evaluate is the name.

For some cases I got name such as "#1" or "@with" and when I try to search with this characters the system is not returning any result.

is this a bug? Is the library currently not able to support those searchs? Or is some missing configuration the problem?

image

lucaong commented 2 years ago

Hi @diegogmez , In principle there is no problem with special characters, MiniSearch should handle them just fine. That said, the default tokenizer splits by space or punctuation, so characters like @ and # are removed by it. This is the reason why you get no results, or surprising ones, for such queries.

Luckily, the solution is simple: you can provide a custom tokenizer that works for your case. If, for example, you want to split by space only, leaving punctuation characters intact, you could do this:

const miniSearch = new MiniSearch({
  fields: ['name', /* …other fields */],
  tokenize: (text) => text.split(/\s+/)
})

I hope this helps.

diegogmez commented 2 years ago

Hello @lucaong,

Yes! That helps lot! So the default tokenization also splits for all the spaces and punctuation including those characters. Then make totally sense that nor result were found

Thanks for for fast answer!