Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental
if one plans to take action based on a prediction.
when choosing whether to deploy a new model.
such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.
Our main contributions are summarized as follows.
LIME, an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.
SP-LIME, a method that selects a set of representative instances with explanations to address the “trusting the model” problem, via submodular optimization.
Comprehensive evaluation with simulated and human subjects, where we measure the impact of explanations on trust and associated tasks.
We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests)and image classification (e.g. neural networks).
We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust:
deciding if one should trust a prediction,
choosing between models,
improving an untrustworthy classifier,
identifying why a classifier should not be trusted.
In our experiments, non-expertsusing LIME are able to pick which classifier from a pairgeneralizes better in the real world.
Further, they are able to greatly improve an untrustworthy classifier trained on 20 newsgroups, by doing feature engineering using LIME.
We also show how understanding the predictions of a neural network on images helps practitioners know when and why they should not trust a model.
Introduction
It is important to differentiate between two different (but related) definitions of trust:
trusting a prediction,
i.e. whether a user trusts an individual prediction sufficiently to take some action based on it
trusting a model,
i.e. whether the user trusts a model to behave inreasonable ways if deployed.
Both are directly impacted by how much the human understands a model’s behaviour, as opposed to seeing it as a black box.
Determining trust in individual predictions is an important problem when the model is used for decision making.
When using machine learning for medical diagnosis or terrorism detection, for example, predictions cannot be acted upon on blind faith, as the consequences may be catastrophic.
Apart from trusting individual predictions, there is also a need to evaluate the model as a whole before deploying it “in the wild”.
To make this decision, users need to be confident that the model will perform well on real-world data, according to the metrics of interest.
Currently, models are evaluated using accuracy metrics on an available validation dataset.
However, real-world data is often significantly different, and further, the evaluation metric may not be indicative of the product’s goal.
Inspecting individual predictions and their explanations is a worthwhile solution, in addition to suchmetrics.
In this case, it is important to aid users by suggesting which instances to inspect, especially for large datasets.
In this paper, we propose providing explanations for individual predictions as a solution to the “trusting a prediction”problem, and selecting multiple such predictions (and explanations) as a solution to the “trusting the model” problem.
Conclusion
In this paper, we argued that trust is crucial for effective human interaction with machine learning systems, and that explaining individual predictions is important in assessing trust.
We proposed LIME, a modular and extensible approach to faithfully explain the predictions of any model in an interpretable manner.
We also introduced SP-LIME, amethod to select representative and non-redundant predictions, providing a global view of the model to users.
Our experiments demonstrated that explanations are useful for a variety of models in trust-related tasks in the text and image domains, with both expert and non-expert users:
deciding between models,
assessing trust,
improving untrustworthy models, and
getting insights into predictions.
There are a number of avenues of future work that we would like to explore.
Although we describe only sparse linear models as explanations, our framework supports the exploration of a variety of explanation families, such as decision trees; it would be interesting to see a comparative study on these with real users.
One issue that we do not mention in this work was how to perform the pick step for images, and we would like to address this limitation in the future.
The domain and model agnosticism enables us to explore a variety of applications, and we would like to investigate potential uses in speech, video, and medical domains,as well as recommendation systems.
Finally, we would liketo explore theoretical properties (such as the appropriate number of samples) and computational optimizations (suchas using parallelization and GPU processing), in order to provide the accurate, real-time explanations that are critical for any human-in-the-loop machine learning system.
Inappropriate strong corelation between feature and target.
Data shift
2. Machine learning practitioners often have to select a model from a number of alternatives, requiring them to assess the relative trust between two or more models.
Abstract
such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.
Our main contributions are summarized as follows.
Comprehensive evaluation with simulated and human subjects, where we measure the impact of explanations on trust and associated tasks.
Introduction
It is important to differentiate between two different (but related) definitions of trust:
Both are directly impacted by how much the human understands a model’s behaviour, as opposed to seeing it as a black box.
In this case, it is important to aid users by suggesting which instances to inspect, especially for large datasets.
Conclusion