mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.01k stars 401 forks source link

Add Ability to Convert Continuous Data to Categorical #117

Open eladmw opened 4 years ago

eladmw commented 4 years ago

Add Ability to Convert Continuous Data to Categorical : sklearn.preprocessing.KBinsDiscretizer

pplonski commented 4 years ago

It can be added for sure. But for now, it is hard for me to tell you how this can be exposed to the user. Options that I can see:

Any ideas how would you like to see it?

eladmw commented 4 years ago

User info could be accepted. For example, if model = AutoML(), model.discretize(train,columns=[],bins=int,replace=False), where replace can replace continuous with categorical if wanted. It's also possible for this to be detected during the fit process and recommended to the user as an output. Also, I think that more categorical features would normally help because CatBoost looks for combinations of categorical variables to aid in boosting.