[FEA] Naive bayes classifier

JohnZed commented 4 years ago

It would be helpful to provide a Naive Bayes classifier for cuML.

Sklearn has an implementation: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB

Spark ML also supports Naive Bayes (Multinomial): https://spark.apache.org/docs/1.6.0/api/java/index.html?org/apache/spark/ml/classification/NaiveBayes.html

JohnZed commented 4 years ago

Sparse input support would be critical for this, as most users of Naive bayes have sparse inputs.

VibhuJawa commented 4 years ago

Plus one on Sparse input support .

FWIW, In the current scope of things currently the output of the HashingVectorizer and TF-IDF implementations should be able to give us 3 arrays which follow the format of scipy.sparse.csr_matrix .

If we can allow the input as below then it would be amazing.

The 3 arrays are :

data
indices
indptr

Which follow the standard CSR representation where for the row i,

Column indices are stored as follows: indices[indptr[i]:indptr[i+1]]
Corresponding values are stored in data[indptr[i]:indptr[i+1] .

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

JohnZed commented 4 years ago

See PR #1375 which implements this

rapidsai / cuml

[FEA] Naive bayes classifier #1265