How to interpret LIME results?

marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier

BSD 2-Clause "Simplified" License

11.59k stars 1.81k forks source link

How to interpret LIME results? #113

Closed jorgecarleitao closed 7 years ago

jorgecarleitao commented 7 years ago

I am considering using LIME, and I am having some struggle to understand what exactly it outputs.

I posed a question on stack exchange with a MCVE, but maybe this is more suitable here.

Consider the following code, that uses logistic regression to fit a logistic process, and uses LIME for a new example.

import numpy as np
import lime.lime_tabular
from sklearn.linear_model import LogisticRegression

# generate a logistic latent variable from `a` and `b` with coef. 1, 1
data = []
for t in range(100000):
    a = 1 - 2 * np.random.random()
    b = 1 - 2 * np.random.random()
    noise = np.random.logistic()
    c = int(a + b + noise > 0)  # to predict
    data.append([a, b, c])
data = np.array(data)

x = data[:, :-1]
y = data[:, -1]

# fit Logistic regression without regularization (C=inf)
classifier = LogisticRegression(C=1e10)
classifier.fit(x, y)

print(classifier.coef_)

# "explain" with LIME
explainer = lime.lime_tabular.LimeTabularExplainer(
                x, mode='classification',
                feature_names=['a', 'b'])

explanation = explainer.explain_instance(np.array([1, 1]), classifier.predict_proba, num_samples=100000)
print(explanation.as_list())

output:

[[ 0.9981159   0.99478328]]  # print(classifier.coef_)
[('a > 0.50', 0.219), ('b > 0.50', 0.219)] # print(explanation.as_list())

the ~[[1, 1]] is because we are doing logistic regression to a Logistic process with these coefficients.

What do the values 0.219... mean? Are they relatable to any quantity of this example?

marcotcr commented 7 years ago

I think this is a bit confusing because you're using numerical data, but the default parameters in TabularExplainer discretize the data into quartiles. It is harder to interpret explanations for numerical features for the following reasons:

The values may be in different ranges. We can always standardize the data, but then the meaning of the coefficients changes
It's hard to think about double negatives (i.e. negative weight for a negative feature = positive contribution)

Anyway, let's consider the meaning of the explanations in the discretized version. What ('a > 0.50', 0.219) is saying is that on average (considering the training data distribution), having a in this bucket raises the prediction by 0.219. Consider the following:

import itertools
other_values = np.arange(-1, .49, .01)
current_pred = classifier.predict_proba([1, 1])[0, 1]
current_pred - classifier.predict_proba(np.array(list(itertools.product(other_values, [1]))))[:, 1].mean()
# output: (0.21064350778528229)

Roughly, what I'm doing above is integrating over other values of a while keeping b fixed. On average, if we do that, the output moves by 0.211. Think of doing that for both features, while weighting by locality - that is what the coefficients in the explanation are getting at.

You could set discretize_continuous=False in the LimeTabularExplainer constructor. This example would still be a tricky one, because there are many equivalent linear models that fit the data equally well with different intercepts, and LIME will pick an arbitrary one (so the weights are not necessarily going to be the same, even if the approximation is almost perfect)

jorgecarleitao commented 7 years ago

Thanks Marc for the input. It did help.

As a follow-up, here are the results of the relative error of LIME with increasing number of samples, where relative error means the explanation with a given number of samples x, against sampling from the opposite side a < 0.5 (x samples), like you did.

test

Would you expect the relative error to go to zero? If not, what variables would I need to increase for the error to go to zero? If none, what approximations explain the discrepancy of ~5%?

Figure generated with the code below:

import numpy as np
import lime.lime_tabular
from sklearn.linear_model import LogisticRegression

data = []
for t in range(1000000):
    a = 1 - 2 * np.random.random()
    b = 1 - 2 * np.random.random()
    noise = np.random.logistic()
    c = int(a + b + noise > 0)  # to predict
    data.append([a, b, c])
data = np.array(data)

x = data[:, :-1]
y = data[:, -1]

classifier = LogisticRegression(C=1e10)
classifier.fit(x, y)

print(classifier.coef_)

explainer = lime.lime_tabular.LimeTabularExplainer(x, mode='classification', feature_names=['a', 'b'])

event = np.array([1, 1])

current_pred = classifier.predict_proba(event)[0, 1]

result = []
for samples in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    samples = samples * 1000
    print(samples)
    # increase number of samples for the explanation
    explanation = explainer.explain_instance(event, classifier.predict_proba, num_samples=samples).as_list()

    # freeze b and sample `a` from the interval `-1 < a < 0.50`
    import itertools
    other_values = -1 + 1.5 * np.random.random(samples)  # a_i from U(-1,0.5)
    other_values = np.array(list(itertools.product(other_values, event[1])))  # as a matrix [[a_1, b], [a_2, b], ...]
    residuals = current_pred - classifier.predict_proba(other_values)[:, 1]

    relative_error = (explanation[0][1] - residuals.mean())/residuals.mean()

    result.append([samples, relative_error])
result = np.array(result)

import matplotlib.pyplot as plt
plt.figure()
plt.plot(result[:, 0], result[:, 1])
plt.ylabel('relative error')
plt.xlabel('samples')
plt.xscale('log')
plt.savefig('test.png')

Output to stdout:

[[ 0.99826413  1.00231008]]
1000
2000
4000
8000
...

marcotcr commented 7 years ago

I would not expect the error to go to zero, because the model is using continuous data while LIME is approximating it with the discretized version. Also, there is the locality weighting, i.e. samples near the point being explained are weighted more heavily than samples far away. The error would go to zero if the model was actually using discretized data and if you set the kernel width to infinity.

jorgecarleitao commented 7 years ago

Marcos, thank you for the explanation and for taking the time to read and comment this, much appreciated!

To double-check we understood everything so far, these are the facts so far:

F1: on a simple logistic regression (as defined on the code above), there is a systematic error of ~5% for the event [1,1].

These are the hypothesis on the table:

H1: LIME result of 0.219 in ('a > 0.50', 0.219) above is roughly how much the probability increases when a > 0.5 in comparison to a in other quantiles. [code in your first comment]
H2: the systematic error (F1) is explained by two factors:

LIME is discretizing data
there is a Kernel, so it is not a simple average over the other quartiles

Let's assume that "roughly" in H1 means within 10%. I.e. if LIME's systematic error is within 10% on points on the quartile, then H1 is not rejected.

To test H1, we can repeat the same experiment as we did for F1 on different events. Under H1, the error remains "roughly" small (10%).

Below I show the same errors as before for different events (in the legend, a single run per point):

test (the code I used is at the end of this comment, in case someone wants to double-check)

We see that there are events with errors of 120%, way above the 10% threshold. Only the event (1,1) is below 10% (reproducing my first comment). I conclude from this result that the hypothesis H1 is false. In other words, regardless of H2, the hypothesis H1 that LIME result of 0.219 in ('a > 0.50', 0.219) is how much the probability increases when a > 0.5 is not supported by the results in the figure above.

Maybe the interpretation is different? Or do you think that LIME is not applicable for this case? If not, why would you expect it to be applicable in continuous data? (Logistic regression is the simplest classification example I know of...)

Have you tested LIME on this type of examples? I went through the tests folder and haven't found a test on the actual values. I was also not able to find anything on the arxiv paper.

If you think that we should switch to the non-discretized version, please let me know, I would happily repeat this for the non-discrete (with an equivalent test).

import numpy as np
import lime.lime_tabular
from sklearn.linear_model import LogisticRegression

data = []
for t in range(1000000):
    a = 1 - 2 * np.random.random()
    b = 1 - 2 * np.random.random()
    noise = np.random.logistic()
    c = int(a + b + noise > 0)  # to predict
    data.append([a, b, c])
data = np.array(data)

x = data[:, :-1]
y = data[:, -1]

classifier = LogisticRegression(C=1e10)
classifier.fit(x, y)

explainer = lime.lime_tabular.LimeTabularExplainer(x, mode='classification', feature_names=['a', 'b'])

print(classifier.coef_)

import matplotlib.pyplot as plt
plt.figure()
for i in range(1, 6):
    event = np.array([0.5 + 0.1*i, 0.5 + 0.1*i])

    current_pred = classifier.predict_proba(event)[0, 1]

    print(event, current_pred)

    result = []
    for samples in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
        samples = samples * 1000
        print(samples)
        # increase number of samples for the explanation
        explanation = explainer.explain_instance(event, classifier.predict_proba, num_samples=samples).as_list()

        # freeze b and sample `a` from the interval `-1 < a < 0.50`
        import itertools
        other_values = -1 + 1.5 * np.random.random(samples)  # a_i from U(-1,0.5)
        other_values = np.array(list(itertools.product(other_values, [1])))  # as a matrix [[a_1, b], [a_2, b], ...]
        residuals = current_pred - classifier.predict_proba(other_values)[:, 1]

        print(explanation, residuals.mean())

        relative_error = (explanation[0][1] - residuals.mean())/residuals.mean()

        result.append([samples, relative_error])
    result = np.array(result)

    plt.plot(result[:, 0], result[:, 1], 'o-', label='(%.2f, %.2f)' % tuple(event))
plt.ylabel('relative error')
plt.xlabel('samples')
plt.xscale('log', basex=2)
plt.legend()
plt.savefig('test.png')

marcotcr commented 7 years ago

Discretization does make everything tricky. In your code, you're computing the residual with respect to the prediction of the event. However, LIME is taking the event to be a > 0.5 and b > 0.5, not two specific values. So, instead of:

current_pred = classifier.predict_proba(event)[0, 1]

We should have

lime_event = (.5 +  .5 * np.random.random(samples * 2)).reshape(-1, 2) # a > .5 and b > .5
current_pred = classifier.predict_proba(lime_event)[:,1].mean()

Also, the value of b for LIME is b > 0.5, so instead of:

other_values = -1 + 1.5 * np.random.random(samples)  # a_i from U(-1,0.5)
other_values = np.array(list(itertools.product(other_values, [1])))  # as a matrix [[a_1, b], [a_2, b], ...]

let's have:

other_values = -1 + 1.5 * np.random.random(samples)  # a_i from U(-1,0.5)
other_b = (.5 +  .5 * np.random.random(samples)) # b_i from U(.5, 1)
other_values = np.vstack((other_values, other_b)).T

These two would explain why your error goes up the further you are from a=1, I think. Also, explanation.as_list() returns the features in decreasing order of importance, so relative error should be:

relative_error = (dict(explanation)['a > 0.50'] - residuals.mean())/residuals.mean()

Doing these results in a relative error that is ~constant with respect to the events (around 10%). Anyway, note that the explanation also has an intercept. What I meant by 'roughly' before is that the weight for 'a > 0.50' is going to be close to: explanation.intercept[1]+ dict(explanation.as_list())['b > 0.50'] - classifier.predict_proba(other_values)[:, 1]

jorgecarleitao commented 7 years ago

Thank you @marcotcr for the explanation. That does indeed explain the error above:

test-338283262 (code below)

To summarize: the interpretation of

[('a > 0.50', 0.219), ('b > 0.50', 0.219)]

the probability of 1 increases by 0.219 when a in [0.5,1] when compared to a in [-1,0.5], averaged over b in [0.5,1].

Doesn't this imply that LIME result only depends on the quartile that the events belong to? For example, isn't it possible for LIME to provide the same explanation for two events whose outcome is opposite? (e.g. the model gives a different prediction on different values of the same quartile).

import numpy as np
import lime.lime_tabular
from sklearn.linear_model import LogisticRegression

data = []
for t in range(1000000):
    a = 1 - 2 * np.random.random()
    b = 1 - 2 * np.random.random()
    noise = np.random.logistic()
    c = int(a + b + noise > 0)  # to predict
    data.append([a, b, c])
data = np.array(data)

x = data[:, :-1]
y = data[:, -1]

classifier = LogisticRegression(C=1e10)
classifier.fit(x, y)

explainer = lime.lime_tabular.LimeTabularExplainer(x, mode='classification', feature_names=['a', 'b'])

import matplotlib.pyplot as plt
plt.figure()
for i in range(1, 6):
    event = np.array([0.5 + 0.1*i, 0.5 + 0.1*i])
    print(event)

    result = []
    for samples in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
        samples = samples * 1000

        lime_events = (.5 + .5 * np.random.random(samples * 2)).reshape(-1, 2)  # a > .5 and b > .5
        current_pred = classifier.predict_proba(lime_events)[:, 1].mean()
        del lime_events

        # increase number of samples for the explanation
        explanation = explainer.explain_instance(event, classifier.predict_proba, num_samples=samples).as_list()

        # freeze b and sample `a` from the interval `-1 < a < 0.50`
        other_values = -1 + 1.5 * np.random.random(samples)  # a_i from U(-1,0.5)
        other_b = (.5 + .5 * np.random.random(samples))  # b_i from U(.5, 1)
        other_values = np.vstack((other_values, other_b)).T
        residuals = current_pred - classifier.predict_proba(other_values)[:, 1]

        relative_error = (dict(explanation)['a > 0.50'] - residuals.mean())/residuals.mean()

        print(samples, relative_error)
        result.append([samples, relative_error])
    result = np.array(result)

    plt.plot(result[:, 0], result[:, 1], 'o-', label='(%.2f, %.2f)' % tuple(event))
plt.ylabel('relative error')
plt.xlabel('samples')
plt.xscale('log', basex=2)
plt.legend()
plt.savefig('test.png')

marcotcr commented 7 years ago

Yes, that is possible. That is a problem with discretization, we lose the ability to differentiate things within the discretized bins. Obvious solutions to this involve using more bins (deciles, entropy-based discretization) or not discretizing at all. Not discretizing at all has its own drawbacks, which is why I decided to leave discretization in as the default.

jorgecarleitao commented 7 years ago

@marcotcr. I understand. Regardless of this particular point, LIME remains a useful tool for interpretability. Thank you and the other authors for taking the time to develop and publish it, provide source code to reproduce its results, and thank you especially for clarifying the points raised here. Definitely a great example of how science should be done!

I will close this as resolved.

marcotcr commented 7 years ago

Thanks for the thoughtful questions!

jorgecarleitao commented 7 years ago

Ok, I returned to this, now for the non-discretized version. Essentially, trying to do the same for the non-discretized version of LIME. My expectation, based on the results from the discretized version, is that LIME approximates the partial derivative of the function in respect to each input. However, I may be mistaken, because I am getting a 20% systematic error between LIME and the partial derivate.

test

import numpy as np
import lime.lime_tabular
from sklearn.linear_model import LogisticRegression

data = []
for t in range(1000000):
    a = 1 - 2 * np.random.random()
    b = 1 - 2 * np.random.random()
    noise = np.random.logistic()
    c = int(a + b + noise > 0)  # to predict
    data.append([a, b, c])
data = np.array(data)

x = data[:, :-1]
y = data[:, -1]

classifier = LogisticRegression(C=1e10)
classifier.fit(x, y)

print(classifier.coef_)

explainer = lime.lime_tabular.LimeTabularExplainer(x, mode='classification', feature_names=['a', 'b'], discretize_continuous=False)

event = np.array([0.7, 0.7])

current_pred = classifier.predict_proba(event)[0, 1]

result = []
for samples in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    samples = samples * 1000
    # increase number of samples for the explanation
    explanation = explainer.explain_instance(event, classifier.predict_proba, num_samples=samples).as_list()

    # freeze b and sample `a_i` (`a` + N(0, 0.001), `b`) to compute partial derivatives
    a_i = event[0] + np.random.normal(scale=0.001, size=samples)

    a_i = np.array([[x, event[1]] for x in a_i])  # as an array of events [[a_1, b], [a_2, b], ...]

    # partial derivatives df/da
    d_a = (current_pred - classifier.predict_proba(a_i)[:, 1])/(event[0] - a_i[:, 0])

    # confirmed that d_a is approximatelly d_a1 below, the analytical derivative of predict_proba
    # exp = np.exp(np.dot(event, np.array([1, 1])))
    # d_a1 = 1 * exp / (1 + exp)**2

    relative_error = (dict(explanation)['a'] - d_a.mean())/d_a.mean()

    result.append([samples, relative_error])
result = np.array(result)

import matplotlib.pyplot as plt
plt.figure()
plt.plot(result[:, 0], result[:, 1])
plt.ylabel('relative error')
plt.xlabel('samples')
plt.xscale('log')
plt.savefig('test.png')

marcotcr commented 6 years ago

Maybe we should move away from the partial derivative interpretation, and back to the original meaning of an explanation: a linear model that approximates the black box model locally. The additional complication is that we scale the data inside explainer if data is not discretized. Thus, for some x, if you take

scaled_x = (x - explainer.scaler.mean_) / explainer.scaler.scale_
fhat = exp.intercept[1] + dict(exp.as_list())['a'] * scaled_x[:, 0] + dict(exp.as_list())['b'] * scaled_x[:, 1]
f = classifier.predict_proba(x)[:, 1]

We should have that (f - fhat).mean() is small, in particular for x that are close to the original instance. Does this make sense?

jorgecarleitao commented 6 years ago

It makes sense.

The reason I approached it from the partial derivatives is that given a point x' = x + h, f(x') - f(x) = (x' - x)^T.Df + O(h^2) (multivariable Taylor series, first order around x). In this view, if the local regressor is a simple linear regression (without lasso), shouldn't the coefficients be equal to the partial derivatives of f?

marcotcr commented 6 years ago

Let's call f'(x') the gradient of x'. Let x be your 'event' above. f(x) is then current_prediction The taylor expansion gives us the following linear approximation f(x') = f(x) + f'(x').dot(x' - x)

LIME is trying to find w such that (ignoring the local weighting for now): f(x') = intercept + w.dot(x')

I don't see why w should be equal to f'(x') in this case. The taylor expansion as an approximation requires us to compute f'(x') for every point we're predicting.

Also, I think f'(x) should be x * exp(x * w) / (exp(x * w) + 1) ^2 , you had exp(x * w) / (exp(x * w) + 1) ^2 if I understood the code correctly.

EoinKenny commented 6 years ago

Hi guys, I hope I'm not hijacking this thread, but it's kind of relevant to interpreting LIME.

If I want to print the coefficients that the local LIME model learned, is there a way to do that? Thanks in advance.

marcotcr commented 6 years ago

exp.as_list() or exp.as_map()

hanzigs commented 5 years ago

I am finding the set of variables list we getting from explainer is keep changing when we re-run the rf_explainer.explain_instance on the same test data, may I please know do we have to set.seed or something, or why the variable importance changes. Thanks

marcotcr commented 5 years ago

see #67, #119, #199.

hanzigs commented 5 years ago

Hi marcotcr, Thanks for the reply, I am still getting different values,

#My explainer
model_explainer = lime.lime_tabular.LimeTabularExplainer(X_train.values[:,:], mode='classification',verbose=True, training_labels=data_norm['class'], feature_names=feature_names, random_state=np.random.seed(42))

#My Function
def explain(exp, instance, predict_fn):
  np.random.seed(42)
  exp_data = exp.explain_instance(instance, predict_fn)
  return exp_data.as_list()

#My Call
explain(model_explainer,X_test.values[1], model.predict_proba)

Still i am getting different values

In [146]: explain(model_explainer,X_test.values[1], model.predict_proba)
Intercept 0.9839002793712486
Prediction_local [0.98390028]
Right: 0.9936423124350471
Out[146]: 
[('Dec_Reason_CLA_URNED <= 0.00', 0.0),
 ('Property_Acceptable_Y <= 1.00', 0.0),
 ('Trading_State_NT <= 0.00', 0.0),
 ('Decision_Reason_REQUEST_F <= 0.00', 0.0),
 ('Product_Type_P <= 0.00', 0.0),
 ('Product_Type_I <= 0.00', 0.0),
 ('Fax_Number <= 0.00', 0.0),
 ('Product_Type_D <= 0.00', 0.0),
 ('Valuation_Acceptable_Y <= 1.00', 0.0),
 ('Product_Type_B <= 0.00', 0.0)]

In [147]: explain(model_explainer,X_test.values[1], model.predict_proba)
Intercept 0.9849758158162301
Prediction_local [0.98497582]
Right: 0.9936423124350471
Out[147]: 
[('Product_Type_B <= 0.00', 0.0),
 ('Product_Type_8 <= 0.00', 0.0),
 ('Valuation_Acceptable_Y <= 1.00', 0.0),
 ('Home_Phone <= 0.00', 0.0),
 ('Permanent_Resident_Y <= 1.00', 0.0),
 ('Product_Type_P <= 0.00', 0.0),
 ('Dec_Reason_PRE_CLAPOL <= 0.00', 0.0),
 ('Dec_Reason_FR_ERT <= 0.00', 0.0),
 ('Product_Type_C <= 0.00', 0.0),
 ('Product_Type_I <= 0.00', 0.0)]

Also, I am getting all values as zeros, which I don't understand why, may I have some help please?

nooraliraeeji commented 4 years ago

hello I have a lstm model. it is a classification model. I want to use explain_instance. what is predict function in explain_predict? thanks