I generate a simple linear model dataset and use explain_instance() to check the variable influence. I expect the influence values should be consistent to the true parameters as this model is really simple. But I find that LIME rarely gives expected results.
My code:
from lime import *
import scipy
import numpy as np
import sklearn.linear_model
import lime
import lime.lime_tabular
# Generating data
#fit linear model
model = sklearn.linear_model.LinearRegression()
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(X, y, train_size=0.80)
model.fit(train, labels_train)
explainer = lime.lime_tabular.LimeTabularExplainer(train, verbose=True, mode='regression')
i = 21
exp = explainer.explain_instance(test[i], model.predict, num_features=3)
The true parameters are: [3, 1, 2.5] and the estimated parameters are : [3.00008921 0.99996862 2.49974063]. So the 0 feature should have maximum positive influence and all features should have positive influence. But LIME gives influence : [-2.07, -1.33, 0.46] which is almost irrelevant to the true parameters. Change thei won't make things better.
Finally I understand that I should compare theta*test[i], not theta. The use print(test[i]*theta), the the order of [-2.07, -1.33, 0.46] is comparable.
I generate a simple linear model dataset and use
to check the variable influence. I expect the influence values should be consistent to the true parameters as this model is really simple. But I find that LIME rarely gives expected results.My code:
The true parameters are:
[3, 1, 2.5]
and the estimated parameters are :[3.00008921 0.99996862 2.49974063]
. So the0
feature should have maximum positive influence and all features should have positive influence. But LIME gives influence :[-2.07, -1.33, 0.46]
which is almost irrelevant to the true parameters. Change thei
won't make things better.Did I make something wrong?