Closed tommydino93 closed 3 years ago
The second argument to explain_instance
is a prediction function that takes as input a list of strings and outputs a list of prediction probabilities. Your pipeline_d2v
returns a list of predictions.
Hi Marco,
Thanks for your reply! Would it work if I turn pipeline_d2v
into a class and create a predict_proba
method? Or do you think there is a faster workaround?
Thanks again
You don't need a class, you can just create a function. See #172, #200 for examples with other models.
Hi Marco,
Thanks again for your reply and for the examples. I followed #172 and modified pipeline_d2v
into:
def pipeline_d2v(x_test_list_of_strings, y_test, model_d2v, random_forest):
x_test = [text.split() for text in x_test_list_of_strings]
test_data = pd.DataFrame({'reports': x_test, 'global_labels': y_test})
test_tagged = test_data.apply(lambda r: TaggedDocument(words=r['reports'], tags=[r.global_labels]), axis=1)
x_test_embedded = vec_for_learning_no_labels(model_d2v, test_tagged)
return random_forest.predict_proba(x_test_embedded)
def extract_lime_explanation_d2v(idx_doc_to_investigate, vectorizer, random_forest, x_test, y_test, out_dir, cnt_document, prediction, embedding, save=True):
class_names = ["stable", "unstable"]
explainer = LimeTextExplainer(class_names=class_names)
x_test_list_of_strings = [' '.join(x) for x in x_test]
c = pipeline_d2v(x_test_list_of_strings, y_test, vectorizer, random_forest)
exp = explainer.explain_instance(x_test[idx_doc_to_investigate], c, num_features=6)
Now pipeline_d2v
takes as input x_test_list_of_strings
and outputs prediction probabilities (variable c
) like:
However, line
exp = explainer.explain_instance(x_test[idx_doc_to_investigate], c, num_features=6)
still gives me the error
File "/home/newuser/PycharmProjects/Medical_Reports/utils.py", line 1237, in extract_lime_explanation_d2v
exp = explainer.explain_instance(x_test[idx_doc_to_investigate], c, num_features=6)
File "/home/newuser/PycharmProjects/Medical_Reports/venv3/lib/python3.6/site-packages/lime/lime_text.py", line 411, in explain_instance
mask_string=self.mask_string))
File "/home/newuser/PycharmProjects/Medical_Reports/venv3/lib/python3.6/site-packages/lime/lime_text.py", line 114, in __init__
self.as_list = [s for s in splitter.split(self.raw) if s]
TypeError: expected string or bytes-like object
What could be the problem? Maybe the additional inputs to pipeline_d2v
? But I need those to embed the documents.
Thank you very much again for your time
type(x_test[idx_doc_to_investigate])
is str
explain_instance
is a prediction function that takes as input a list of strings and outputs a list of prediction probabilities. Your pipeline_d2v still returns a list of prediction probabilities which you are storing in variable c
, and passing c
as argument. You have to call explain_instance with pipeline_d2v
as a second argumentpipeline_d2v
. Check out partial, it's an easy way to wrap the other arguments.
Hi All!
I am trying to apply
explainer.explain_instance
with the doc2vec embedding provided by gensim and a random forest classifier. I managed to reproduce this example with tfidf, but I don't manage to create an sklearn pipeline with Doc2Vec (see functionextract_lime_explanation_d2v
)Any help would be appreciated :)
Here's the (pseudo) code I have so far:
where
x_test
is a list with the documents andtest_tagged
is a gensim TaggedDocumentThanks in advance!