guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
434 stars 98 forks source link

RuntimeWarning: invalid value encountered in cast n_zeros = np.empty(n_bins).astype(np.int64) #270

Closed max-franceschi closed 7 months ago

max-franceschi commented 9 months ago

I use BinningProcess in a pipeline to preprocess numeric data that contain NAs:

numericNAs_transformer = Pipeline(
    steps=[
        ('NA_imputer', SimpleImputer(strategy='constant', fill_value=-1)), 
        ('discretizer', BinningProcess(numeric_features_with_NAs, 
                                       min_bin_size = 0.05,
                                       min_n_bins = 2,
                                       binning_fit_params =  {numeric_features_with_NAs[i]: {'prebinning_method': 'cart'} for i in range(len(numeric_features_with_NAs))}, # all variables are prebinned with CART
                                       binning_transform_params = {numeric_features_with_NAs[i]: {'metric': 'indices'} for i in range(len(numeric_features_with_NAs))})), # all variables are binned and transformed to indices
        ('OneHotEncoder', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
        ]
)

preprocessor = ColumnTransformer(
    transformers=[
        ('numNAs', numericNAs_transformer, numeric_features_with_NAs),
    ], 
    remainder='passthrough' # passthrough features not listed
)

preprocessor.fit(X,y)

but when I fit this pipeline I have one or multiple errors (it's random) saying c:\Users\XXX\Documents\DossierProjet\pvj\.venv\lib\site-packages\optbinning\binning\continuous_binning.py:912: RuntimeWarning: invalid value encountered in cast n_zeros = np.empty(n_bins).astype(np.int64).

I cannot understand what is causing this issue. Can anyone help? Sorry that I cannot provide a reproducible example.

PS: I use a ColumnTransformer because I have other transformers (not using optbinning and working well).

guillermo-navas-palencia commented 9 months ago

Isn't it related to this "issue"? https://github.com/guillermo-navas-palencia/optbinning/issues/194

max-franceschi commented 9 months ago

Thanks for your feedback. I indeed use ColumnTransformer as you suggest in #194 but this is not the issue here I think. Here, I have no problem with other columns, only those that go in my numericNAs_transformer at the optimal binning step. I just don't understand the error and how to solve it.

bmreiniger commented 8 months ago

np.empty has a dtype argument, should the offending line just be changed to use that instead of astype?

I suppose the array has entries leftover from other things in released memory, of float size, that then cannot be cast into the int type? Not my expertise though...

max-franceschi commented 8 months ago

Hi ! Sorry @bmreiniger I did not understand your answer and then forget to reply. I do not understand what the warning means. What is the issue at hand?

bmreiniger commented 8 months ago

@max-franceschi can you provide a minimal reproducible example? Or, I've made the change I suggested in #273; you could try that and see if it fixes your problem.

guillermo-navas-palencia commented 7 months ago

I close this issue since no reproducible example was provided. Please reopen it if you have an example ready.