Closed markroxor closed 6 years ago
(i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the oppo- site result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets
Naive Bayes (NB) and Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization and sentiment analy- sis research. However, their performance varies sig- nificantly depending on which variant, features and datasets are used. We
Multinomial Naive Bayes
Let's assume that we want to infer a classification and, like all good scientists, we're Bayesian. Denoting the classes as Ci and any relevant features F⃗ , the probability of a given class is given by Bayes Theorem, p(Ci|F⃗ )=p(F⃗ |Ci)p(Ci)∑jp(F⃗ |Cj)p(Cj). Now all we need to do is model the class likelihoods, p(F⃗ |Ci).
The Naive Bayes assumption is that the features are independent given a class, i.e. p(F⃗ |Ci)=∏jp(Fj|Ci). Popular choices of the p(Fj|Ci) include the Bernoulli distribution (taking into account whether a binary feature occurs or not) and the Binomial distribution (taking into account not just the presence of a binary feature but also its multiplicity).
When a feature is discrete but not binary (or continuous but approximated by such discrete values) a common choice of likelihood is the multinomial distribution, hence Multinomial Naive Bayes. This is a direct generalization of the Binomial Naive Bayes model.
Side Note: The description in the paper linked from Waleed Kadous' answer isn't consistent with the Naive Bayes assumptions discussed above. Their multinomial model does not assume independent features, instead it models the interacting binary features with a multinomial distribution.
Not the right place!
https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf