marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.5k stars 1.79k forks source link

Problem implementing the RecurrentTabularExplainer in mix dataset ( numerical and categorical) #596

Open hkhanal opened 3 years ago

hkhanal commented 3 years ago

My inputdata looks like

X_train = (6697, 6, 23), and last seven columns are categorical variables.

categorical_features=[16, 17, 18, 19, 20, 21, 22]
explainer = lime.lime_tabular.RecurrentTabularExplainer(X_train, training_labels=y_train.argmax(axis=1), feature_names=data_columns,
                                                   discretize_continuous=True, categorical_features=categorical_features,
                                                   class_names=['A', "B", "C"], categorical_names= categorical_names,  discretizer='decile')

exp = explainer.explain_instance(X_test[1], model.predict, num_features=20, labels=(1,0,2))

I got the following error massage.


IndexError                                Traceback (most recent call last)
<ipython-input-115-111b3ce428a0> in <module>
----> 1 exp = explainer.explain_instance(X_test[1], model.predict, num_features=20, labels=(1,0,2))
      2 exp.show_in_notebook()

/usr/local/anaconda/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    700             num_samples=num_samples,
    701             distance_metric=distance_metric,
--> 702             model_regressor=model_regressor)

/usr/local/anaconda/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    413             name = int(data_row[i])
    414             if i in self.categorical_names:
--> 415                 name = self.categorical_names[i][name]
    416             feature_names[i] = '%s=%s' % (feature_names[i], name)
    417             values[i] = 'True'

IndexError: index 3 is out of bounds for axis 0 with size 3

Highly appreciated for any help.

marcotcr commented 3 years ago

Can you print out what categorical_names is?

hxfdanger commented 1 year ago

@marcotcr In explain_instance method for RecurrentTabularExplainer, in this loop:

name = int(data_row[i])            
if i in self.categorical_names:
    name = self.categorical_names[i][name]

The variable name will have an incorrect index because data_row is flattened (n_timesteps * n_features). Therefore, in my opinion, it should be replaced with:

name = int(data_row[i * self.n_timesteps])        
if i in self.categorical_names:
    name = self.categorical_names[i][name]

Perhaps (i'm not sure, how u can do it ...), it is necessary to redefine this class method to make it compatible with each tabular method.