keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.15k stars 19.49k forks source link

Problem using LIME with Keras #10123

Closed EoinKenny closed 6 years ago

EoinKenny commented 6 years ago

I train a model like so

model = Sequential() model.add(Dense(200, input_dim=11, kernel_initializer='normal', activation='relu')) model.add(Dropout(0.3)) model.add(Dense(200, activation='relu')) model.add(Dropout(0.3)) model.add(Dense(200, activation='relu')) model.add(Dropout(0.3)) model.add(Dense(1, activation='relu')) model.compile(loss='mean_squared_error', optimizer='adam') # Fit the model model.fit(X_train, y_train, epochs=200, batch_size=512)`

Then I try to make a LIME prediction...

`import lime import lime.lime_tabular import pandas as pd

explainer = lime.lime_tabular.LimeTabularExplainer(df.as_matrix(), feature_names=df.columns, class_names=['Price'], verbose=True, mode='regression')

exp = explainer.explain_instance(qc_reshape[0], model.predict, num_features=len(df.columns))

exp.show_in_notebook(show_table=True)

exp.as_list()`

After that I get the error...

`AssertionError Traceback (most recent call last) /anaconda3/envs/LIME/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 300 try: --> 301 assert isinstance(yss, np.ndarray) and len(yss.shape) == 1 302 except AssertionError:

AssertionError:

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)

in () 8 verbose=True, mode='regression') 9 ---> 10 exp = explainer.explain_instance(qc_reshape[0], model.predict, num_features=len(df.columns)) 11 12 exp.show_in_notebook(show_table=True) /anaconda3/envs/LIME/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 302 except AssertionError: 303 raise ValueError("Your model needs to output single-dimensional \ --> 304 numpyarrays, not arrays of {} dimensions".format(yss.shape)) 305 306 predicted_value = yss[0] ValueError: Your model needs to output single-dimensional numpyarrays, not arrays of (5000, 1) dimensions ` **Is there any way to make Keras output single dimensional arrays? I've looked around for a while and no luck :(** Thanks in advance!
EoinKenny commented 6 years ago

Nevermind, I fixed it.

The issue is that keras return a 2D array and LIME wants a 1D one. Simply make a helper function and replace it with model.predict in the LIME pipeline.

h2ku commented 6 years ago

@EoinKenny Could you share the code?

EoinKenny commented 6 years ago

Sure thing!

You might have to adjust one or two things though, I haven't tested this pipeline specifically.

model = load_model('keras_model.h5')

# Previously loaded pandas df
qc = df.as_matrix()[1]
qc_reshape = qc.reshape(1,-1)

def predict(qc):
    global model
    qc = model.predict(qc)
    return qc.reshape(qc.shape[0])

import lime
import lime.lime_tabular
import pandas as pd

explainer = lime.lime_tabular.LimeTabularExplainer(df.as_matrix(), 
                                                   feature_names=df.columns, 
                                                   class_names=['Price'], 
                                                   verbose = True,
                                                   mode='regression')

exp = explainer.explain_instance(qc, predict, num_features=len(df.columns))
h2ku commented 6 years ago

Thank you @EoinKenny

I used following.

model

def flatten_predict(input):
    return model.predict(input).flatten()
EoinKenny commented 6 years ago

I just want to update this for classification issues, it requires a probability both positive and negative.

` def flatten_predict(i): global model

predictions = model.predict_proba(i)
x = np.zeros((predictions.shape[0], 1))
probability = (x + 1) - predictions
final = np.append(predictions, probability, axis=1)

return final

`

EoinKenny commented 6 years ago

No problem! I think your solution looks cleaner. I just wanted to update this for classification since it requires it in sklearn format.

:cupid:

`def flatten_predict(i): global nn_no_bias_clf

predictions = nn_bias_clf.predict_proba(i)
x = np.zeros((predictions.shape[0], 1))
probability = (x + 1) - predictions
final = np.append(predictions, probability, axis=1)

return final`
blaklodge commented 5 years ago

Hello, I ran into the same issue Eoin ran into (using Keras regression, last layer Dense(1)) and tried using your flatten_predict() function, which solve the original error, but I now get this error:

ValueError: Found input variables with inconsistent numbers of samples: [5000, 180000]

It seems the explainer is truncating samples at 5000. Does anyone have a suggestion on how to solve this issue?

Thanks!

EoinKenny commented 5 years ago

I just checked my previous implementation for regression and I simply used the function as

def flatten_predict(qc):
    global model
    qc = model.predict(qc)
    return qc.reshape(qc.shape[0])

Can't think why it wouldn't work, sorry.

blaklodge commented 5 years ago

Thanks Eoin! That helped. I'm now trying to get Lime to work with my Keras RNN model (LSTM --> Dense(3))

I'm using the RecurrentTabularExplainer. When I run the explainer I get this error: ---> 11 return qc.reshape(qc.shape[0]) ValueError: cannot reshape array of size 180000 into shape (5000,)

Here is the code I'm using:

`explainer = lime.lime_tabular.RecurrentTabularExplainer(batched_train_x, feature_names=x_features_list, class_names=['vsd'], categorical_features=None, verbose=True, mode="regression")

exp = explainer.explain_instance(batched_test_x[i:i+1,:,:], flatten_predict, num_features=len(x_features_list))`

data is in (batch, time steps, features): batched_train_x.shape = (4,12,228) batched_test_x.shape = (1, 12, 228)

Do you have any suggestions on how to get this issue solved? Thanks again for your help.

CoteDave commented 5 years ago

Same problem here, i'm trying to do a XGBRegressor multi-output regression (y1,y2,y3,y4...y30) , with 531 features and locally explain each Y forecast with LIME. I tried

explainer = lime.lime_tabular.LimeTabularExplainer(X_train, training_labels = y_features_list, feature_names=feature_list, verbose=True, mode='regression')

def predict(test): global multioutputregressor test = multioutputregressor.predict(test) return test.reshape(test.shape[0])

exp = explainer.explain_instance(data_row = X_test[0,:], predict_fn = predict, num_features=10)

But i get

ValueError: cannot reshape array of size 150000 into shape (5000,)

Thanks!

hanzigs commented 5 years ago

Hi @EoinKenny, I am linking to Lime issue https://github.com/marcotcr/lime/issues/376 I tried your solution

Keras_model.predict_proba(testData_for_model)
Out[73]: array([[0.6559619]], dtype=float32)

def predictKeras(testData_for_model):
    prediction_Class_1 = Keras_model.predict_proba(testData_for_model) 
    x = numpy.zeros((prediction_Class_1.shape[0], 1))
    probability = (x + 1) - prediction_Class_1
    final = numpy.append(probability,prediction_Class_1, axis=1)
    return final

The output of final is

final
Out[71]: array([[0.3440381‬, 0.6559619]])

Then I call

keras_explainer = lime.lime_tabular.LimeTabularExplainer(input_x, 
                                                 mode='classification',
                                                 feature_names=feature_names,
                                                 kernel_width=5,
                                                 random_state=42,
                                                 discretize_continuous=False) 
test_for_explainer = testData_for_model.reshape(testData_for_model.shape[1],)
exp = keras_explainer.explain_instance(test_for_explainer, predictKeras, num_features = 10)

Its Working Good

One question is, keras returns probability for Class 1, how should I mention class names in explainer, as exp.class_names return 0

exp.class_names
Out[85]: ['0']