marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.53k stars 1.8k forks source link

Is it possible to provide a workable example for explaining categorical and numeric features together for keras RNN LSTM with LIME? #308

Open nasheennur opened 5 years ago

nasheennur commented 5 years ago

I am currently working on a time series database to predict stock price, and I am using Keras RNN LSTM model. Since we have to formulate to Numpy 3D array in LSTM, the tabular explanation in LIMe is not working for explaining categorical and numerical data together. I would like to use LIME to help explain the results via visualization.

I would really appreciate any response if someone already worked it out.

Thanks, Nasheen

marcotcr commented 5 years ago

See this and this.

nasheennur commented 5 years ago

Hi @marcotcr , I apologize, I may not be that much clear about the issue. Does LIME handle explaining categorical features for RNN? I don't think so. It is failing to index the flattened data_row for categorical features. I am pasting the screenshot here. I just added a random column(test_cata) with categorical features to your CO2 examples. The encodings are 0,1,2,3 for 't','tv','v','vt' and the timestep is 12. I am getting the following error Screen Shot 2019-04-08 at 3 26 19 PM This problem is happening when I am passing model.predict to the explain instance. But when I am passing predict_fn = lambda x: model.predict_proba(encoder1.fit_transform(x)).astype(float) the error is different. It's with the line yss = predict_fn(inverse). The inverse shape is 2D, where the _make_predict_proba function is returning 3d array.

in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 322 ).ravel() 323 print(inverse.shape) --> 324 yss = predict_fn(inverse) 325 326 # for classification, the model needs to provide a list of tuples - classes in predict_proba(X) 595 X = np.transpose(X.reshape(new_shape), axes=(0, 2, 1)) 596 print(X) --> 597 return func(X) 598 599 return predict_proba in (x) ----> 1 predict_fn = lambda x: model.predict_proba(encoder1.fit_transform(x)).astype(float) ~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py in fit_transform(self, X, y) 512 return _transform_selected( 513 X, self._legacy_fit_transform, self.dtype, --> 514 self._categorical_features, copy=True) 515 else: 516 return self.fit(X).transform(X) ~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/base.py in _transform_selected(X, transform, dtype, selected, copy, retain_order) 43 Xt : array or sparse matrix, shape=(n_samples, n_features_new) 44 """ ---> 45 X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES) 46 47 if sparse.issparse(X) and retain_order: ~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 568 if not allow_nd and array.ndim >= 3: 569 raise ValueError("Found array with dim %d. %s expected <= 2." --> 570 % (array.ndim, estimator_name)) 571 if force_all_finite: 572 _assert_all_finite(array, ValueError: Found array with dim 3. Estimator expected <= 2. **Can you suggest any quick fix?**
marcotcr commented 5 years ago

Ah, sorry, I had misunderstood. You are right, this is a bug - categorical features expects indexes, which get messed up when the input gets unrolled. The quick fix is mapping the categorical feature indexes to the appropriate thing (each categorical feature has to be repeated n_steps times). I will wait for someone to do a pull request though, I don't have the time to do this right now : )

KhawlaSeddiki commented 3 years ago

Is there an update for this function/error ? I am facing the same problem.