rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 526 forks source link

[FEA] Naive bayes classifier #1265

Closed JohnZed closed 4 years ago

JohnZed commented 4 years ago

It would be helpful to provide a Naive Bayes classifier for cuML.

Sklearn has an implementation: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB

Spark ML also supports Naive Bayes (Multinomial): https://spark.apache.org/docs/1.6.0/api/java/index.html?org/apache/spark/ml/classification/NaiveBayes.html

JohnZed commented 4 years ago

Sparse input support would be critical for this, as most users of Naive bayes have sparse inputs.

VibhuJawa commented 4 years ago

Plus one on Sparse input support .

FWIW, In the current scope of things currently the output of the HashingVectorizer and TF-IDF implementations should be able to give us 3 arrays which follow the format of scipy.sparse.csr_matrix .

If we can allow the input as below then it would be amazing.

The 3 arrays are :

Which follow the standard CSR representation where for the row i,

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

JohnZed commented 4 years ago

See PR #1375 which implements this