marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.4k stars 1.79k forks source link

How to explain prediction for a data with just a few features (from all features of training dataset)? #731

Open williamty opened 8 months ago

williamty commented 8 months ago

I have generated lightGBM models for prediction. I can explain the predictions with all features by filling user input data with NAs. Is there any way to explain prediction for the original user input data without filling it?

apoplexi24 commented 8 months ago

You can run the LIME Explainer on select few columns/features too. Change the LIME explainer from classification to regression if your model is used for regression.

// write logic to select the features that you want to run LIME Explainer on // for example selected_data = data[['col1', 'col2']]

// Create a LimeTabularExplainer explainer = LimeTabularExplainer(selected_data.values, feature_names=selected_data.columns.values, mode="classification")

// select instance to explain data_row = selected_data.iloc[0] # Get the first row in the selected data // the num_features helps us select the features that we want to predict explanation = explainer.explain_instance(data_row, lgbm_model.predict, num_features=len(selected_data.columns))

// Display the explanation explanation.show_in_notebook()

The above method works for explaining the predictions when you want to have selective features. But if you want to generate predictions, your testing data (x_test) has to have the same features as the training data (x_train) in the ML model, or else it'll throw error of features not being the same.

williamty commented 8 months ago

@apoplexi24 Thank you for your kind reply!! It worked! By the way, I have also changed the code of predict function, setting the 'predict_disable_shape_check' parameter to true: ` def predict_fn(x): if len(np.array(x).shape) == 1:

Reshape individual data points to 2D

    return ldl.predict(np.array(x).reshape(1, -1), predict_disable_shape_check=True)
else:
    # Predict for the entire dataset
    return ldl.predict(x, predict_disable_shape_check=True)

def predict_fn_binary(x): return np.column_stack((1 - predict_fn(x), predict_fn(x))) `