marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.5k stars 1.79k forks source link

Issues on LimeTextExplainer #582

Closed Raiseku closed 3 years ago

Raiseku commented 3 years ago

I'm trying to solve a text classification problem with Keras and LSTM, my model is the following:

index

I have 5 classes: 'sport', 'tech', 'business', 'politics', 'entertainment' The problem is that when i pass the STR in explain_instance() it becomes a list of length 5000, so when i execute my code i get the following error:

ValueError: Found input variables with inconsistent numbers of samples: [5000, 1]

Can someone explain to me how to solve this problem? My code is the following:


from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names = ['sport', 'tech', 'business', 'politics', 'entertainment'])

indice = 17
stringa = df['text'].iloc[indice]
def get_predict_proba_fn_of_class(stringa):
    def new_predict(stringa): 
      row_utente = {'testo' : [stringa[0]]}
      df_utente = pd.DataFrame(row_utente)
      df_utente['testo_token'] = tokenizer.texts_to_sequences(df_utente['testo'])
      df_utente['testo_token_padding'] = pad_sequences(df_utente['testo_token'], padding = "post", maxlen = max_len).tolist()
      valori_df_testo_utente = df_utente['testo_token_padding'].values 
      valori_testo_utente = np.array([item for item in valori_df_testo_utente])
      predizione_utente = model_final.predict(valori_testo_utente)
      return predizione_utente
    return new_predict

print("Chossen Phrases: " + df['text'].iloc[indice])
print("Real Class: " + df['category'].iloc[indice])

STR = df['text'].iloc[indice]
wrapp = get_predict_proba_fn_of_class(STR)
exp = explainer.explain_instance(STR, wrapp, num_features=7, top_labels=1)
exp.show_in_notebook(text = True)
marcotcr commented 3 years ago

The prediction function used as an argument to explain_instance should take as input a list of strings and return a 2d array of prediction probabilities. It looks as if your function takes in a single string.