interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.22k stars 726 forks source link

IndexError using ebm.predict() method #154

Closed abhipannala closed 3 years ago

abhipannala commented 4 years ago

I'm using: Ubuntu 18.04.2 LTS JupyterLab v1.04

When I try to use the .predict() method for the ExplainableBoostingRegressor() object. I run into an IndexError:


IndexError Traceback (most recent call last)

in ----> 1 ebm.predict(X_test) ~/.local/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in predict(self, X) 1577 1578 return EBMUtils.regressor_predict( -> 1579 X, self.feature_groups_, self.additive_terms_, self.intercept_ 1580 ) ~/.local/lib/python3.7/site-packages/interpret/glassbox/ebm/utils.py in regressor_predict(X, feature_groups, model, intercept) 196 @staticmethod 197 def regressor_predict(X, feature_groups, model, intercept): --> 198 scores = EBMUtils.decision_function(X, feature_groups, model, intercept) 199 return scores 200 ~/.local/lib/python3.7/site-packages/interpret/glassbox/ebm/utils.py in decision_function(X, feature_groups, model, intercept) 162 X, feature_groups, model 163 ) --> 164 for _, _, scores in scores_gen: 165 score_vector += scores 166 ~/.local/lib/python3.7/site-packages/interpret/glassbox/ebm/utils.py in scores_by_feature_group(X, feature_groups, model) 142 feature_idxs = feature_group 143 sliced_X = X[feature_idxs, :] --> 144 scores = tensor[tuple(sliced_X)] 145 146 yield set_idx, feature_group, scores IndexError: index -2 is out of bounds for axis 0 with size 1 What's weird is that I only get this error when I predict over X_test. The columns are identical.
timbr99 commented 4 years ago

Having the same issue. Predict does not work when trying to predict Testset, with predict Train there is no problem. RedHat 7.8 JupyterLab 2.1.0 interpret v0.2.1 and 0.2.0


IndexError Traceback (most recent call last)

in ----> 1 ebm.predict_proba(X_test) ~/.conda/envs/x/lib/python3.6/site-packages/interpret/glassbox/ebm/ebm.py in predict_proba(self, X) 1444 1445 prob = EBMUtils.classifier_predict_proba( -> 1446 X, self.feature_groups_, self.additive_terms_, self.intercept_ 1447 ) 1448 return prob ~/.conda/envs/x/lib/python3.6/site-packages/interpret/glassbox/ebm/utils.py in classifier_predict_proba(X, feature_groups, model, intercept) 175 def classifier_predict_proba(X, feature_groups, model, intercept): 176 log_odds_vector = EBMUtils.decision_function( --> 177 X, feature_groups, model, intercept 178 ) 179 ~/.conda/envs/x/lib/python3.6/site-packages/interpret/glassbox/ebm/utils.py in decision_function(X, feature_groups, model, intercept) 162 X, feature_groups, model 163 ) --> 164 for _, _, scores in scores_gen: 165 score_vector += scores 166 ~/.conda/envs/x/lib/python3.6/site-packages/interpret/glassbox/ebm/utils.py in scores_by_feature_group(X, feature_groups, model) 142 feature_idxs = feature_group 143 sliced_X = X[feature_idxs, :] --> 144 scores = tensor[tuple(sliced_X)] 145 146 yield set_idx, feature_group, scores IndexError: index -2 is out of bounds for axis 0 with size 1
interpret-ml commented 4 years ago

Hi @abhipannala and @timbr99 ,

Thank you for reporting this! Would you mind sharing the shape of your X_test objects (X_test.shape)? We typically see this IndexError when X_test contains a single instance and is 1-dimensional (i.e. X_test.shape == (10,) as opposed to X_test.shape == (1, 10)).

If X_test happens to be a 1-dimensional numpy array, you can transform it to our expected 2-dimensional format by calling X_test.reshape(1, -1).

Thanks! -InterpretML Team

timbr99 commented 4 years ago

Thank you so much for your reply @interpret-ml!

My X_test.shape is (51, 8418) and y_test is (51,1). My X_train.shape is (261, 8418) and y_train.shape is (261,1).

As already statet the problem does not persist when using the predict function on the X_train dataframe. Also tried using NumPy Arrays instead of DataFrame. There are only categorial values in the train and test set, the split was done once with train_test_split and once manually. I do not have this problem when using other data (tried winequality from kaggle).

If you have any ideas on what I could be doing wrong any help is greatly appriciated.

dataninjia commented 4 years ago

I have the same issue as @timbr99.

I will add that I've tried to train with X_train to predict X_test dataSet and also the reverse X_test to predict X_train dataset. Both will predict fine on the dataset being trained on and fails otherwise.

interpret-ml commented 4 years ago

Hi @dataninjia, @timbr99, and @abhipannala,

We think this might be occurring because X_test contains a categorical value that doesn't exist in X_train. Can you check to see if this would explain the error in your cases? Our next release will contain a fix for this. In the meantime, if you'd like an early version of the fix, you can obtain it with:

pip install -I --no-deps --no-cache-dir https://interpret.blob.core.windows.net/pywheel/57ef36525f7a180af405d647f90835f509a088b7/interpret-0.2.1-py3-none-any.whl

Thanks for reporting this!

-InterpretML team

timbr99 commented 4 years ago

Hi @interpret-ml, sorry for the late answer. Unfortunatly this didn't fix it for me - same error. I checked all columns from X_train are in X_test and vice versa.

interpret-ml commented 4 years ago

Hi timbr99 -- Sorry the above didn't resolve the issue that you reported. Due to a change in our packaging, the pip install command above didn't work as expected. We have a fix for the issue in our develop branch, which will be pushed to pypi in our next release.

-InterpretML team

interpret-ml commented 3 years ago

Hi @dataninjia, @timbr99, and @abhipannala --

Our latest release should fix this issue. Thank you for reporting this to us!

-InterpretML team

sitandon commented 3 years ago

@interpret-ml: I am also getting the same issue. And I don't have any string variable in my dataset. Btw, how EBM considers a variable which has only {0} in training and {1,0} in testing