marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.55k stars 1.8k forks source link

How to use LimeTextExplainer on Tensorflow text classifier? #610

Closed Neihtq closed 3 years ago

Neihtq commented 3 years ago

Hi,

we have a classification model based on this. So basically it takes in a text input and outputs the probability whether the content is classified as true news. Currently we use TensorFlow version 2.4.1.

We prepare our data for inference and training like this:

from keras.preprocessing import text, sequence

max_features = 10000
maxlen = 300

# x_train and x_test are pandas Series with strings (articles) in each row
tokenizer = text.Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(x_train)

tokenized_test = tokenizer.texts_to_sequences(x_test)
X_test = sequence.pad_sequences(tokenized_test, maxlen=maxlen)

and this is how we try to get explanation via lime from a single test sample:

from lime.lime_text import LimeTextExplainer

explainer = LimeTextExplainer(class_names=['true', 'false'])
exp = explainer.explain_instance(X_test[:1], model.predict, num_features=6)

However the error TypeError: cannot use a string pattern on a bytes-like object occurs. This is an example output of model.predict(X_test[:1]):

array([[0.0290247]], dtype=float32)

I am not sure what to do at this point as the encoding of the text data is required by the model in order to make predictions.

marcotcr commented 3 years ago

See #200. You need the predict_fn to take a list of strings as input, and produce a 2d array of prediction probabilities. To do that, you need to wrap your encoding function with your prediction function into a function that does it all.