lmassaron / deep_learning_for_tabular_data

A presention of core concepts and a data generator making easier using tabular data with TensorFlow and Keras
40 stars 11 forks source link

Feature importance #1

Open gladomat opened 4 years ago

gladomat commented 4 years ago

How would you go about finding the feature importance for the DNN model?

lmassaron commented 4 years ago

Both LIME (https://github.com/marcotcr/lime) and SHAP (https://github.com/slundberg/shap) can provide you with feature importance of a DNN model. They require some extra-work, though. I will write some articles on that in the near future.

gladomat commented 4 years ago

Thanks for your answer. I tried SHAP but kept running into trouble with the inputs, and unfortunately I haven't been abel to figure it out. I would really appreciate a little tutorial on it.

gladomat commented 4 years ago

I've found the following solution. SHAP loses the feature names and tb.transform needs them to differentiate categorical from numerical features:

feature_names=X_train.columns.to_list()
def model_predict(data_asarray):
    data_asframe =  pd.DataFrame(data_asarray, columns=feature_names)
    x = tb.transform(data_asframe)
    return model.predict(x)

Then you can use SHAP to get the values.

# use Kernel SHAP to explain test set predictions
explainer = shap.KernelExplainer(model_predict, X_train.iloc[:50,:])
shap_values = explainer.shap_values(X.iloc[299,:], nsamples=500)
shap.decision_plot(explainer.expected_value, shap_values[0], features=X_train.iloc[:50,:], feature_names=feature_names)

I found the solution on stack exchange but I don't remember the link, so that's the only credit I can give.