sdpython / mlinsights

Extends scikit-learn with new models, transformers, metrics, plotting.
http://www.xavierdupre.fr/app/mlinsights/helpsphinx/index.html
MIT License
69 stars 13 forks source link

_apply_prediction_method boolean indexing incompatible with standard sklearn format #95

Closed samtalki closed 3 years ago

samtalki commented 3 years ago

Hello,

When attempting to train a piecewise_estimator function, an error is consistently produced when using the standard (n_samples,n_features) sklearn format. The use of boolean indexing on line 296 of the _apply_prediction_method function is creating this issue.

For example, for the following data:

print(X_train.shape,y_train.shape) print(X_test.shape) (23476, 1) (23476, 1) (11564, 1)

Attempting to train the model in this fashion:


from sklearn.tree import DecisionTreeRegressor
from mlinsights.mlmodel import PiecewiseRegressor

model = PiecewiseRegressor(verbose=True,
                          binner=DecisionTreeRegressor(min_samples_leaf=300))

model.fit(X_train,y_train)
vvc_predict = model.predict(X_test)

plot_customer(customer1)
plt.plot(X_test,vvc_predict,'g.',label='VVC_predict',alpha=0.2)
plt.legend()

Yields the following errors:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  60 out of  60 | elapsed:    0.0s finished

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-63d7c4526d72> in <module>
      6 
      7 model.fit(X_train,y_train)
----> 8 vvc_predict = model.predict(X_test)
      9 
     10 plot_customer(customer1)

~\anaconda3\lib\site-packages\mlinsights\mlmodel\piecewise_estimator.py in predict(self, X)
    350         :return: predictions
    351         """
--> 352         return self._apply_predict_method(
    353             X, "predict", _predict_piecewise_estimator, self.dim_)
    354 

~\anaconda3\lib\site-packages\mlinsights\mlmodel\piecewise_estimator.py in _apply_predict_method(self, X, method, parallelized, dimout)
    294             if ind is None:
    295                 continue
--> 296             pred[ind] = p
    297             indall = numpy.logical_or(indall, ind)  # pylint: disable=E1111
    298 

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

By observing the TypeError, it seems numpy wants 0 or 1 dimensional input for boolean indexing. But reshaping is incompatible with the mlinsights library.

I have attempted to solve this using a mask: https://github.com/sdpython/mlinsights/pull/94 which lets me use PiecewiseRegressor successfully,

But it seems my contribution isn't correct based on the checks.