yooper / php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
https://github.com/yooper/php-text-analysis/wiki
MIT License
527 stars 87 forks source link

CharFilter not working? #44

Closed gvanto closed 5 years ago

gvanto commented 6 years ago

My tokenDoc->toArray gives output below after applying CharFilter, I was expecting to not see single-character elements still in there?

array(15) {                                                                                                                             
  [0] =>                                                                                                                                
  string(1) "i"                                                                                                                         
  [1] =>                                                                                                                                
  string(1) "a"                                                                                                                         
  [2] =>                                                                                                                                
  string(7) "plumber"
gvanto commented 6 years ago

Done a bit of testing, this seems to work:

class CharFilter implements ITokenTransformation
{ 
    public function transform($word)
    {
        return trim(preg_replace("/ \D /", "", " $word "));
    }
}
yooper commented 5 years ago

Thank you, I will investigate further this weekend.