marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.4k stars 1.79k forks source link

LimeTabularExplainer.explain_instance - model_regressor parameter #709

Open PaschalisLagias opened 1 year ago

PaschalisLagias commented 1 year ago

Using LimeTabularExplainer class, I am trying to understand what local model is used to explain a single instance. According to documentation, sklearn.linear_model.Ridge is the default model. If this is case, how is the local prediction estimated e.g. in a classification scenario?

For instance, we have features A=2, B=3 and

Y=1, where A and B are numerical and Y is binary {0, 1}. We have found for an instance the following LIME weights: WA = 0.3 WB = 0.6 Intercept = 0.4.

How can we verify the explanation.local_pred with the returned LIME weights and intercept? What is the background calculation to get this Local Prediction? Also, is it possible to use an sklearn.linear_model.LogisticRegression as a model_regressor parameter?

I tried it and got an error:


ValueError Traceback (most recent call last) Input In [91], in <cell line: 6>() 3 lgr = LogisticRegression() 4 lr = LinearRegression() ----> 6 lime_exp = lime_explainer.explain_instance( 7 data_row=X_transformed[j], 8 predict_fn=svm_classifier.predict_proba, 9 top_labels=1, 10 num_features=x_train.shape[1], 11 num_samples=1000, 12 model_regressor=lgr 13 ) 15 # Extract feature weights for a single sample 16 print(lime_exp.as_map())

File c:\users\pasch\documents\zinia\venv\lib\site-packages\lime\lime_tabular.py:452, in LimeTabularExplainer.explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 448 labels = [0] 449 for label in labels: 450 (ret_exp.intercept[label], 451 ret_exp.local_exp[label], --> 452 ret_exp.score, ret_exp.local_pred) = self.base.explain_instance_with_data( 453 scaled_data, 454 yss, 455 distances, 456 label, 457 num_features, 458 model_regressor=model_regressor, 459 feature_selection=self.feature_selection) 461 if self.mode == "regression": 462 ret_exp.intercept[1] = ret_exp.intercept[0]

File c:\users\pasch\documents\zinia\venv\lib\site-packages\lime\lime_base.py:192, in LimeBase.explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor) 189 model_regressor = Ridge(alpha=1, fit_intercept=True, 190 random_state=self.random_state) 191 easy_model = model_regressor --> 192 easy_model.fit(neighborhood_data[:, used_features], 193 labels_column, sample_weight=weights) 194 prediction_score = easy_model.score( 195 neighborhood_data[:, used_features], 196 labels_column, sample_weight=weights) 198 local_pred = easy_model.predict(neighborhood_data[0, used_features].reshape(1, -1))

File c:\users\pasch\documents\zinia\venv\lib\site-packages\sklearn\linear_model_logistic.py:1347, in LogisticRegression.fit(self, X, y, sample_weight) 1342 _dtype = [np.float64, np.float32] 1344 X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype, 1345 order="C", 1346 accept_large_sparse=solver != 'liblinear') -> 1347 check_classificationtargets(y) 1348 self.classes = np.unique(y) 1350 multi_class = _check_multi_class(self.multiclass, solver, 1351 len(self.classes))

File c:\users\pasch\documents\zinia\venv\lib\site-packages\sklearn\utils\multiclass.py:183, in check_classification_targets(y) 180 y_type = type_of_target(y) 181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput', 182 'multilabel-indicator', 'multilabel-sequences']: --> 183 raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

In case you need further information or description on my question, happy to provide.