mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

Bug with SHAP when uses more than 1k samples dataset #173

Closed tmdavid closed 4 years ago

tmdavid commented 4 years ago

In order to reproduce the bug:

X_train = np.random.uniform(size=(5763, 31))
y_train = np.random.randint(0, 2, size=(5763, ))
​
automl = AutoML()
automl.fit(X_train, y_train)

The output will produce this exception:

Exception while producing SHAP explanations. positional indexers are out-of-bounds

If I'm not mistaken the bug can be found in this part:

https://github.com/mljar/mljar-supervised/blob/fb71ef2b8399ca631d0c5fad58ad7af2d11647dd/supervised/utils/shap.py#L76

pplonski commented 4 years ago

@tmdavid you are right the bug was in get_sample() method. It is fixed. I've added unit tests for this method. The fix will be in 0.7.1 release which should be today/tomorrow.