mispy-archive / twitter_ebooks

Better twitterbots for all your friends~
MIT License
972 stars 140 forks source link

No unicode in keywords #77

Closed ripperdoc closed 8 years ago

ripperdoc commented 9 years ago

When I use unicode input, all keywords get cutoff at that unicode. I found the reason. In nlp.rb, line 102:

set :word_pattern, /(?<!@)(?<=\s)[\w']+/

should be

set :word_pattern, /(?<!@)(?<=\s)[\p{Word}']+/
Gotos commented 9 years ago

This still seams to be an issue. German Umlauts ä, ö, ü and ß still cause these problems.