guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
435 stars 98 forks source link

Bug memory error with ContinuousOptimalBinning #239

Closed nic9lif3 closed 1 year ago

nic9lif3 commented 1 year ago

Hi @guillermo-navas-palencia,

Can you check this data for ContinuousOptimalBinning. It raises an error about memory but it's simple data.

bin=optbinning.ContinuousOptimalBinning()

bin.fit(x,y)
MemoryError                               Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 bin.fit(x,y)

File ~\AppData\Roaming\Python\Python38\site-packages\optbinning\binning\continuous_binning.py:440, in ContinuousOptimalBinning.fit(self, x, y, sample_weight, check_input)
    416 def fit(self, x, y, sample_weight=None, check_input=False):
    417     """Fit the optimal binning according to the given training data.
    418 
    419     Parameters
   (...)
    438         Fitted optimal binning.
    439     """
--> 440     return self._fit(x, y, sample_weight, check_input)

File ~\AppData\Roaming\Python\Python38\site-packages\optbinning\binning\continuous_binning.py:570, in ContinuousOptimalBinning._fit(self, x, y, sample_weight, check_input)
    563     logger.info("Pre-processing: number of samples: {}"
    564                 .format(self._n_samples))
    566 time_preprocessing = time.perf_counter()
    568 [x_clean, y_clean, x_missing, y_missing, x_special, y_special,
    569  y_others, categories, cat_others, sw_clean, sw_missing, sw_special,
--> 570  sw_others] = split_data(
    571     self.dtype, x, y, self.special_codes, self.cat_cutoff,
    572     self.user_splits, check_input, self.outlier_detector,
    573     self.outlier_params, None, None, None, sample_weight)
    575 self._time_preprocessing = time.perf_counter() - time_preprocessing
    577 if self.verbose:

File ~\AppData\Roaming\Python\Python38\site-packages\optbinning\binning\preprocessing.py:205, in split_data(dtype, x, y, special_codes, cat_cutoff, user_splits, check_input, outlier_detector, outlier_params, fix_lb, fix_ub, class_weight, sample_weight)
    202 if special_codes is None:
    203     clean_mask = ~missing_mask
--> 205     x_clean = x[clean_mask]
    206     y_clean = y[clean_mask]
    207     x_missing = x[missing_mask]

MemoryError: Unable to allocate 5.34 GiB for an array with shape (716847076,) and data type int64

X_bin.zip

guillermo-navas-palencia commented 1 year ago

image

y is a 2-D array. Just use y.ravel().