Open kornelski opened 5 years ago
Please note that the library should be multilingual, e.g. ،
and ؛
are punctuation characters in Persian. So, \p{P}
is easier to be used for multilingual support. However, 's
must be ignored as you mentioned.
The punctuation regex includes apostrophe, so it splits "foo's" as two separate phrases. I'm seeing "s something" in keywords.
I think it could be fixed by using less smart splitting: