marcotcr / anchor

Code for "High-Precision Model-Agnostic Explanations" paper
BSD 2-Clause "Simplified" License
799 stars 114 forks source link

Error when using save_to_file() - IndexError: [E040] Attempt to access token at x, max length x. #75

Closed Enantiodromis closed 3 years ago

Enantiodromis commented 3 years ago

Hey!

I am encountering an error that I cannot seem to resolve. The error does not occur every time I run an explanation but more often than not. The error occurs when using the save_to_file function.

CODE SNIPPET

####################
# ANCHOR EXPLAINER #
####################
def anchor_explainer(X_test_encoded, model, word_index, tokenizer):
    # Creating a reverse dictionary
    reverse_word_map = dict(map(reversed, word_index.items()))

    # Function takes a tokenized sentence and returns the words
    def sequence_to_text(list_of_indices):
        # Looking up words in dictionary
        words = [reverse_word_map.get(letter) for letter in list_of_indices]
        return words
    my_texts = np.array(list(map(sequence_to_text, X_test_encoded)))

    def wrapped_predict(strings):
        cnn_rep = tokenizer.texts_to_sequences(strings)
        text_data = pad_sequences(cnn_rep, maxlen=30)
        prediction = model.predict(text_data)
        predicted_class = np.where(prediction > 0.5, 1,0)[0]
        return predicted_class

    test_text = ' '.join(my_texts[6])
    nlp = spacy.load('en_core_web_sm')
    explainer = AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=True)
    exp = explainer.explain_instance(test_text, wrapped_predict, threshold=0.95)
    exp.save_to_file("text_explanations/anchors_text_explanations/lime_test_data3.html", )

ERROR MESSAGE

Traceback (most recent call last):
  File "c:/Users/.../Documents/Visual Studio Code Workspace/xai_classification_mixed_data/code/text_classification/anchor_text_explanation.py", line 66, in <module>
    anchor_explainer(X_test, model, word_index, tokenizer)
  File "c:/Users/.../Documents/Visual Studio Code Workspace/xai_classification_mixed_data/code/text_classification/anchor_text_explanation.py", line 42, in anchor_explainer
    exp.save_to_file("text_explanations/anchors_text_explanations/lime_test_data3.html", )
  File "C:\Users\...\Anaconda3\envs\shap_text\lib\site-packages\anchor\anchor_explanation.py", line 108, in save_to_file
    out = self.as_html(**kwargs)
  File "C:\Users\...\Anaconda3\envs\shap_text\lib\site-packages\anchor\anchor_explanation.py", line 100, in as_html
    return self.as_html_fn(self.exp_map, **kwargs)
  File "C:\Users\...\Anaconda3\envs\shap_text\lib\site-packages\anchor\anchor_text.py", line 219, in as_html
    example_obj.append(process_examples(examples, i))
  File "C:\Users\...\Anaconda3\envs\shap_text\lib\site-packages\anchor\anchor_text.py", line 212, in process_examples
    raw_indexes = [(processed[i].text, processed[i].idx, exp['prediction']) for i in idxs]
  File "C:\Users\...\Anaconda3\envs\shap_text\lib\site-packages\anchor\anchor_text.py", line 212, in <listcomp>
    raw_indexes = [(processed[i].text, processed[i].idx, exp['prediction']) for i in idxs]
  File "spacy\tokens\doc.pyx", line 463, in spacy.tokens.doc.Doc.__getitem__
  File "spacy\tokens\token.pxd", line 23, in spacy.tokens.token.Token.cinit
IndexError: [E040] Attempt to access token at 26, max length 26.

I am not sure if there is an error that lies with my implementation or not... perhaps a tokenizer issue?

Any insight as always would be greatly appreciated!

Thanks!

gkaramanolakis commented 3 years ago

I got the same error as @Enantiodromis and tried to fix it as shown in the above commit.

The error is raised in the following line:

raw_indexes = [(processed[i].text, processed[i].idx, exp['prediction']) for i in idxs]

in cases where the input string is truncated (see commit details), thus leading to strings of fewer tokens than the original input string. When the anchor index is the last token (or one of the last tokens) of the input, then such token is lost, thus raising the above IndexError ("Attempt to access token at 26, max length 26").