marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.41k stars 1.79k forks source link

"Domain error in arguments" when using LimeTabularExplainer #688

Closed HZeng3 closed 1 year ago

HZeng3 commented 1 year ago

Main problem: My model is a XGBoost model that intake pd.dataframe as X, while lime.explainer.explain_instance needs a function as the second argument that intake np.array and output Y value. I tried create my own predict() function and change type of X there, but it didn't work. Below is my code and error. Any help would be appreciated!

def predict(arr): # arr is X, m declared outside function
    tmp = pd.DataFrame([arr], columns=feature_names)
    return m.predict(tmp)

explainer = lime.lime_tabular.LimeTabularExplainer(x_train, 
            feature_names=np.array(feature_names), 
            class_names=['beat'], verbose=True, mode='regression')
exp = explainer.explain_instance(df.values[0], predict, num_features=10)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_27912/2366211079.py in <module>
----> 4     exp = explainer.explain_instance(df.values[0], predict, num_features=10)

c:\users\23655\appdata\local\programs\python\python37\lib\site-packages\lime\lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    338             # Preventative code: if sparse, convert to csr format if not in csr format already
    339             data_row = data_row.tocsr()
--> 340         data, inverse = self.__data_inverse(data_row, num_samples)
    341         if sp.sparse.issparse(data):
    342             # Note in sparse case we don't subtract mean since data would become dense

c:\users\23655\appdata\local\programs\python\python37\lib\site-packages\lime\lime_tabular.py in __data_inverse(self, data_row, num_samples)
    548             inverse[:, column] = inverse_column
    549         if self.discretizer is not None:
--> 550             inverse[1:] = self.discretizer.undiscretize(inverse[1:])
    551         inverse[0] = data_row
    552         return data, inverse

c:\users\23655\appdata\local\programs\python\python37\lib\site-packages\lime\discretize.py in undiscretize(self, data)
    143             else:
    144                 ret[:, feature] = self.get_undiscretize_values(
--> 145                     feature, ret[:, feature].astype(int)
    146                 )
    147         return ret

c:\users\23655\appdata\local\programs\python\python37\lib\site-packages\lime\discretize.py in get_undiscretize_values(self, feature, values)
    130             loc=means[min_max_unequal],
    131             scale=stds[min_max_unequal],
--> 132             random_state=self.random_state
    133         )
    134         return ret

c:\users\23655\appdata\local\programs\python\python37\lib\site-packages\scipy\stats\_distn_infrastructure.py in rvs(self, *args, **kwds)
   1065         cond = logical_and(self._argcheck(*args), (scale >= 0))
   1066         if not np.all(cond):
-> 1067             raise ValueError("Domain error in arguments.")
   1068 
   1069         if np.all(scale == 0):

ValueError: Domain error in arguments.
koschmid commented 1 year ago

Hey Zeng,

i am kind of struggeling with the same problem. Did you find the reason for it?

best regards Konstantin

HZeng3 commented 1 year ago

Hi Konstantin,

Most possibly this is because you have some NaN values in the data.

Bests, Zeng

Yidi-Cao commented 1 year ago

Hi Zeng,

the same problem with lightgbm models, asserted no nan or inf values in features nor target, did you solved this?

best, Edgar

Yidi-Cao commented 1 year ago

solved, feature with one unique value not allowed.