Bin collapse - Githubissues

AffDk commented 5 months ago

I have a feature which is very much skewed to missing values (about 90%). When I run the BinningProcess on this feature and my binary target, it collapses the entire range of this feature into one bin versus the missing one. I tried to play with the parameters of OptimalBinning (passing on binning_fit_params) like min_n_bins, max_pvalue, max_n_prebins, max_bin_size and gamma and different metrics of divergence. But nothing seems to be changing this behavior. I understand that this may suggest that binning does not gain any information value for this feature but I thought that I could make the algorithm even seeks for a slight change by playing with the parameters and make it behave differently. Just to examine its behavior, I removed the missing values, then it could do the binning as I expected. Any suggestion?

guillermo-navas-palencia commented 5 months ago

Hi @AffDk. Did you try prebin parameters? min_prebin_size, for instance

AffDk commented 5 months ago

Thanks. Yes. That helps but increases the computstion time, which I'd say it is expected. For categorical variables, I thought it should use the original categories but to my surprise, bin collapsing happens there too. Should I use the same trick or I can enforce it to use the original categories. Thanks again.

guillermo-navas-palencia commented 5 months ago

You can use the same parameter.

guillermo-navas-palencia / optbinning

Bin collapse #321