pylablanche / gcForest

Python implementation of deep forest method : gcForest
MIT License
417 stars 193 forks source link

How to deal with missing values? #13

Open liuliu629 opened 6 years ago

liuliu629 commented 6 years ago

How to deal with missing values in the input data set?

kingfengji commented 6 years ago

it depends on the base estimators. For instance, xgboost/lightgbm can handle None values in attributes without preprocessing, whereas scikit-learn requires to replace missing values using one-hot encoding or filling some numbers such as mean/median. details can be found: https://github.com/dmlc/xgboost/issues/21 you can also write your own classifiers as base estimator with such features. e.g., https://stats.stackexchange.com/questions/98953/why-doesnt-random-forest-handle-missing-values-in-predictors