rust-ml / linfa

A Rust machine learning framework.
Apache License 2.0
3.68k stars 240 forks source link

Add Multinomial and Bernoulli Naive Bayes algorithms #183

Open sgrigory opened 2 years ago

sgrigory commented 2 years ago

Currently linfa-bayes crate contains Gaussian Naive Bayes algorithm. It should not be very difficult to add other kinds of Naive Bayes present in sklearn:

For a new algorithm one needs to reimplement methods joint_log_likelihood and update_feature_log_prob and the hyperparameters - the rest of the code stays more or less the same.

I have created a draft implementation of Multinomial Naive Bayes in this branch, based on the current code of Gaussian Naive Bayes and the sklearn implementation of MultinomialNB. At the moment a large part of code is copy-pasted from Gaussian Naive Bayes, but it is possible to refactor both to deduplicate the shared code.

Would you consider this a useful feature to have? If yes, I can finalise the draft and open a PR .

@VasanthakumarV @bytesnake, tagging you since you authored and reviewed the original Gaussian Naive Bayes implementation in https://github.com/rust-ml/linfa/pull/51

bytesnake commented 2 years ago

Hi @sgrigory,

It should not be very difficult to add other kinds of Naive Bayes present in sklearn

no not really, the question is rather how we want to add them. The type system should allow us to be generic over the distribution, there are some distribution libraries in Rust but few with MAP estimation.

For a new algorithm one needs to reimplement methods joint_log_likelihood and update_feature_log_prob and the hyperparameters - the rest of the code stays more or less the same.

sounds like a good candidate for a trait

I have created a draft implementation of Multinomial Naive Bayes in this branch, based on the current code of Gaussian Naive Bayes [..]

:+1:

Would you consider this a useful feature to have? If yes, I can finalise the draft and open a PR .

yes, we would accept such a PR. To be really useful, we have to figure out how-to

  1. handle mixed-type datasets
  2. handle distributions with different parametrizations

It may therefore be refactored in the future, but nevertheless we will accept such a PR gladly :)

yuancc06 commented 2 years ago

Hi. I have tried multinomial naive bayes and it works very well in predicting the correct result. However, in some cases I need to get the joint likelihood for further calculations, but I cannot get those numbers because the corresponding function is in pub(crate). I wonder if the developers have plans to make the likelihood/probability function public. Thank you.

YuhanLiin commented 2 years ago

That would require making the NaiveBayes trait public. I'd accept a PR which does this, but with the other method hidden from the docs so that people don't rely on the traits for things other than joint likelihood.