How I can explain CountVector + Naive Bayes My Text Dataset

oegedijk / explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.

http://explainerdashboard.readthedocs.io

MIT License

2.31k stars 332 forks source link

How I can explain CountVector + Naive Bayes My Text Dataset #81

Closed mustfkeskin closed 3 years ago

mustfkeskin commented 3 years ago

Hello i want to explain multiclass text classification I use sklearn CountVectorizer and MultinomialNB

How i can do it

cv = CountVectorizer(stopwords)
nb = MultinomialNB(alpha=.01)

oegedijk commented 3 years ago

Hi @mustfkeskin, the explainer takes in a fitted model and data that is compatible with that fitted model.

So you would have to fit the CountVectorizer to the training set, and transform both the train and test set. Then fit the MultinomialNB on the training set, and feed it to the explainer along with the test set. So something along the likes of:

cv = CountVectorizer().fit(X_train)
X_train = cv.transform(X_train)
X_test = cv.transform(X_test)

nb = MultinomialNB().fit(X_train, y_train)
explainer = ClassifierExplainer(nb, X_test, y_test)
ClassifierDashboard(explainer).run()

mustfkeskin commented 3 years ago

I got this error AttributeError: columns not found X_train contains text array

oegedijk commented 3 years ago

X_test should be a dataframe, but CountVectorizer probably pops out a numpy array, so then you should first wrap it in a dataframe:

explainer = ClassifierExplainer(nb, pd.DataFrame(X_test), y_test)

mustfkeskin commented 3 years ago

Countvector is a sparse matrix so i got this error TypeError: unhashable type: 'csr_matrix' ClassifierExplainer only explain pandas dataframe format? This tool support image or nlp algorithm for explaianability?

oegedijk commented 3 years ago

try

explainer = ClassifierExplainer(nb, pd.DataFrame(X_test.to_array()), y_test)

In any case the tool is mostly meant for tabular data models...