Closed simon21587 closed 5 years ago
What you could do is to override the predict()
methods of the TNTClassfier
class and save each likelihood to an array. After that, you would write a softmax function that would give you the percentage for each category
Instead of overriding the predict()
function, I am adding the following multi_predict()
function:
public function multi_predict($statement)
{
$words = $this->tokenizer->tokenize($statement);
$types = [];
$total_likelihood = 0;
foreach ($this->types as $type) {
$likelihood = log($this->pTotal($type)); // calculate P(Type)
$p = 0;
foreach ($words as $word) {
$word = $this->stemmer->stem($word);
$p += log($this->p($word, $type));
}
$likelihood += $p; // calculate P(word, Type)
$types[$type] = $likelihood;
$total_likelihood += $likelihood;
}
foreach ($types as &$type) {
$type = $type / $total_likelihood;
}
return $types;
}
Do you have any further suggestions?
Here you have an example of the softmax function if you want to get probability distributions
https://gist.github.com/raymondjplante/d826df05349c1d4350e0aa2d7ca01da4
Thank you!
The tutorial says "A state of the art algorithm for text classification is Multinomial Naive Bayes. This is a probabilistic learning method which calculates the probability of a document being in a category.", but the TNTClassifier class only returns 'label' and 'likelihood' (which is a negative number). So how do I get a list of labels and probabilies in percent for a document?
Instead of only two categories, I have - let's say 10 categories and the document may be part of multiple categories. So I'd like to get a list for each document saying:
Category 7: 95% Category 3: 57% Category 4: 35% ...
How do I achive this with your classifier?