yooper / php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
https://github.com/yooper/php-text-analysis/wiki
MIT License
527 stars 87 forks source link

How can I use the TF-IDF? #60

Closed nafre closed 4 years ago

nafre commented 4 years ago

Hi, I was experimenting around and found that this library has a TFIDF implementation. Can someone show me an example to get this to work?

What should I put for the DocumentAbstract $document and the $token? And how can I see the result?

Thanks.

yooper commented 4 years ago

@nafre Hopefully, this example will help.


        $docs = [
            new TokensDocument(tokenize($text1)),
            new TokensDocument(tokenize($text2)),
            new TokensDocument(tokenize($text3))
        ];

        $docCollection = new DocumentArrayCollection($docs);

        $tfIdf = new TfIdf($docCollection);
yooper commented 4 years ago

@nafre , let me know if you have any more questions. I am closing this issue.

nafre commented 4 years ago

Thanks. Yeah just one more question.

TextAnalysis\Indexes\TfIdf: function getTfIdf(DocumentAbstract $document, $token, $mode = 1)

Would it make sense to call this function to get the values for the tfIDF? If it does, when I want to call this function, what do I put in for the parameters?

yooper commented 4 years ago

The source is here. https://github.com/yooper/php-text-analysis/blob/master/src/Indexes/TfIdf.php

Yes, getTfIdf will get you the weight of the document for a given document.

nafre commented 4 years ago

Alright Thanks.