Closed luoq closed 8 years ago
The NBs are not consistent with LinearClassifierMixin IIRC. I don't think I understand your comment on what they should be instead.
The predict_proba method of MultinomialNB is the same as softmax(feature_logprob*X+class_logprior).
For binary classification, in LinearClassifierMixin, coef_ is of shape (1, nfeature) and intercept has length 1, we use p1=logistic(coef*X+intercept) as probability for class 1 (self.classes[1]) and 1-p1 for class 0(self.classes_[0]).
Since exp(s1)/(exp(s0)+exp(s1)) = 1/(1+exp(-(s1-s0))=logistic(s1-s0). We could use the difference between second and first rows(elements) in coef(intercept) to get probability, and the predict_proba method is the same as softmax logistic regression. We cannot write predict_prob with only the second row(element)
Sorry I still don't know what you mean by "row". Row in what?
And I'm not sure what you are saying. Are you saying the results of MultinomialNB.predict_proba
are wrong? If so, can you give a self-contained example of that?
I'm not sure how any of this relates to LinearClassifierMixin. It is not used anywhere in MultinomialNB
.
self.feature_logprob is a 2d array, row means row of self.feature_logprob.
predict_proba is correct; it use feature_logprob and class_logprior directly.
MultinomialNB is a linear classifier though it has no direct relation with LinearClassifierMixin in code. In principle we could reuse the predict methods of softmax logistic regression if set up coef and intercept as follow
def _get_coef(self):
return (self.feature_log_prob_[1:]-self.feature_log_prob_[:1]
if len(self.classes_) == 2 else self.feature_log_prob_)
def _get_intercept(self):
return (self.class_log_prior_[1:]-self.class_log_prior_[:1]
if len(self.classes_) == 2 else self.class_log_prior_)
coef_ = property(_get_coef)
intercept_ = property(_get_intercept)
Otherwise, the reason for setting up coef and intercept for MultinomialNB is strange. It seems they are not used anywhere.
Maybe we should remove these code in BaseDiscreteNB. They are quite pointless for BernoulliNB.
Though BernoulliNB has linear separation boundaries and can be adapted to be consistent with softmax logistic regression( reuse predict methods), the setting of coef, intercept is quite different.
This note in MultinomialNB may clarify my point regarding linear classifier
Notes
-----
For the rationale behind the names `coef_` and `intercept_`, i.e.
naive Bayes as a linear classifier, see J. Rennie et al. (2003),
Tackling the poor assumptions of naive Bayes text classifiers, ICML.
self.coef_ is only used at line 507
elif n_features != self.coef_.shape[1]:
msg = "Number of features %d does not match previous data %d."
raise ValueError(msg % (n_features, self.coef_.shape[-1]))
We can use feature_logprob instead
these changes are introduced in de16cb49. Note the comment
# XXX The following is a stopgap measure; we need to set the dimensions
# of class_log_prior_ and feature_log_prob_ correctly.
It seems they are left there since then
The reason to change dimension for binary classification is unclear. Maybe to simulate coef_ in LogisticRegression?
We can just remove coef, intercept for BaseDiscreteNB and forget about treating "naive Bayes as a linear classifier".
Or we can fix coef, intercept for MultinomialNB and write correct coef, intercept for BernoulliNB. This is helpful sometimes
Ok, so you're talking about this: #2237
the problem with this is that changing behavior like this might break people's code...
duplicated with #2237
MultinomialNB inherits coef, intercept from BaseDiscreteNB defined by
For binary classification, in order to be consistent with LinearClassifierMixin, the coef(intercept) should be the difference between second and first rows(elements)