yooper / php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
https://github.com/yooper/php-text-analysis/wiki
MIT License
527 stars 87 forks source link

Is there a way to get the output in JSON format? #54

Closed amanguptadev closed 5 years ago

amanguptadev commented 5 years ago

I'm getting an object with protected arrays.

ace411 commented 5 years ago

Can you provide more information? Perhaps a code snippet?

amanguptadev commented 5 years ago

Hey, thank you for the quick reply.

I'm using various filters and using FreqDist to get the word counts and frequency. This is my code.

$text = "this is some random text";
        $stopwords = array_map('trim', file(base_path('vendor/yooper/stop-words/data/stop-words_english_1_en.txt')));
        $tokens = tokenize($text);
        $tokenDoc = new TokensDocument($tokens);
        $tokenDoc->applyTransformation(new LowerCaseFilter())
        ->applyTransformation(new TrimFilter())
        ->applyTransformation(new DomainFilter())
        ->applyTransformation(new UrlFilter())
        ->applyTransformation(new EmailFilter())
        ->applyTransformation(new PunctuationFilter())
        ->applyTransformation(new StopWordsFilter($stopwords))
        ->applyTransformation(new SpacePunctuationFilter())
        ->applyTransformation(new QuotesFilter())
        ->applyTransformation(new WhitespaceFilter())
        ->applyTransformation(new CharFilter())
        ->applyTransformation(new NumbersFilter());

        $freqDist = new FreqDist($tokenDoc->toArray());
        $freqDist->getKeyValuesByWeight();

        print_r($freqDist);

The output I'm getting is in this form:

TextAnalysis\Analysis\FreqDist Object ( [keyValues:protected] => Array ( [random] => 1 [text] => 1 ) [totalTokens:protected] => 2 [keysByWeight:protected] => Array ( [random] => 0.5 [text] => 0.5 ) )

As you can see, I'm getting an object with protected arrays and I'm getting a problem to access the array with the key.

ace411 commented 5 years ago

That's odd. I just ran a similar snippet and got a hashtable with frequency stats. The output is as follows:

array(4) {
  ["is"]=>
  float(0.25)
  ["some"]=>
  float(0.25)
  ["random"]=>
  float(0.25)
  ["text"]=>
  float(0.25)
}

A quick fix for your problem might require use of Reflection. Which version of the library are you using?

amanguptadev commented 5 years ago

I've installed it using composer 2 days back.

amanguptadev commented 5 years ago

It should be 1.5 or greater.

amanguptadev commented 5 years ago

Man thank you for your help, I just found I'm so stupid. I wasn't assigning the value of $freqDist->getKeyValuesByWeight(); to the var and trying to print the class obj :p

ace411 commented 5 years ago

No, you're not stupid. This stuff happens all the time. The snippet I typed out appears as follows:

$text = 'this is some random text';

$tokens = tokenize($text);
$doc = new TokensDocument($tokens);

$doc->applyTransformation(new PunctuationFilter())
    ->applyTransformation(new StopWordsFilter(['and', 'this']));

$freq = new TextAnalysis\Analysis\FreqDist($doc->toArray());
$res = $freq->getKeyValuesByWeight();

Hope it helps as a reference point.

amanguptadev commented 5 years ago

Thank you it works :)

yooper commented 5 years ago

@ace411 , Thank you for your help. I appreciate it.