Closed JohnZed closed 4 years ago
Sparse input support would be critical for this, as most users of Naive bayes have sparse inputs.
Plus one on Sparse input support
.
FWIW, In the current scope of things currently the output of the HashingVectorizer
and TF-IDF
implementations should be able to give us 3 arrays which follow the format of scipy.sparse.csr_matrix
.
If we can allow the input as below then it would be amazing.
The 3 arrays are :
Which follow the standard CSR representation where for the row i
,
indices[indptr[i]:indptr[i+1]]
data[indptr[i]:indptr[i+1]
. https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html
See PR #1375 which implements this
It would be helpful to provide a Naive Bayes classifier for cuML.
Sklearn has an implementation: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB
Spark ML also supports Naive Bayes (Multinomial): https://spark.apache.org/docs/1.6.0/api/java/index.html?org/apache/spark/ml/classification/NaiveBayes.html