Open aseifert opened 6 years ago
Hi Alex,
I know English and hence couldn't make it work for other languages because I won't be able to understand/test the functioning.
Would you consider internationalizing the word boundaries or is this restrictive behavior on purpose?
I would consider but I don't know how. You are free to make changes that make sense to you.
Please send pull request we test cases if possible. Would really appreciate that :)
Thanks, Vikash
On Mon, Mar 19, 2018 at 9:11 PM Alexander Seifert notifications@github.com wrote:
Hello there,
first of all: thanks for the amazing algorithm, it's really useful!
It turns out you use only a very restrictive set of characters as non_word_boundaries. For many languages this poses a problem. E.g. in German:
from flashtext import KeywordProcessor kwp = KeywordProcessor() kwp.add_keyword("lt.") kwp.extract_keywords("Damit galt es als so gut wie fix, dass Vueling den Zuschlag erhält.")# i would expect this to be empty
The problem can be fixed (for German) by adjusting the property non_word_boundaries:
kwp.non_word_boundaries = kwp.non_word_boundaries.union(list("ÖÄÜöäüß"))
Would you consider internationalizing the word boundaries or is this restrictive behavior on purpose?
Thanks, Alex
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vi3k6i5/flashtext/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AC-NwiQNXHCZuantgG-JVHKiV0wn1eTaks5tf9GSgaJpZM4SwZYs .
Hello there,
first of all: thanks for the amazing algorithm, it's really useful!
It turns out you use only a very restrictive set of characters as
non_word_boundaries
. For many languages this poses a problem. E.g. in German:The problem can be fixed (for German) by adjusting the property
non_word_boundaries
:Would you consider internationalizing the word boundaries or is this restrictive behavior on purpose?
Thanks, Alex