It doesn't work for persian texts

nextapps-de / flexsearch

Next-Generation full text search library for Browser and Node.js

Apache License 2.0

12.43k stars 491 forks source link

It doesn't work for persian texts #129

Closed hsiami closed 3 years ago

hsiami commented 5 years ago

I used below option for FlexSearch but It doesn't work!?

var index= FlexSearch.create({ encode: "icase", tokenize: "reverse", rtl: true });

index.add(1, "سلام دنیا") index.length=> 1

persisted value => [[{},{},{},{},{},{},{},{},{}],{},["@1"]]

index.search("سلام") => []

johnparn commented 4 years ago

Good to know. I was just looking into FlexSearch as a potential candidate for a search service for persian among other languages. @hsiami , did you ever find a solution to this?

Perhaps a custom tokenizer and a language for stemming is needed? https://github.com/nextapps-de/flexsearch#add-language-specific-stemmer-andor-filter

ts-thomas commented 4 years ago

Would be nice if you can help me to define a language settings for persian: https://github.com/nextapps-de/flexsearch/blob/0.7.0/doc/0.7.0.md

I just need some informations about the language:

An example of a text with a lot of special chars which should not being indexed along with another example of the same text where this special chars was removed (normalize charset)
The sign/character which separate words (e.g. a whitespace)
Has it right-to-left encoding?

With these information I can easily provide a definition for this language.

ts-thomas commented 3 years ago

Do not use any of the latin encoder will solve this.