This PR includes the following additions of the maabdin/imputation branch:
2 new imputation methods: IterativeDataImputer and KNNDataImputer (in addition to the existing BasicImputer pre-implemented).
IterativeDataImputer:
Description: This method imputes missing data of a feature using the other features. It uses a round-robin method of modeling each feature with missing values to be imputed as a function of the other features. This subclass uses the sklearn.impute.IterativeImputer class in the background.
Includes the capability of imputing categorical data by using encoding/decoding through the enable_encoder boolean param.
Added components: main class, doc-string/readthedocs documentation, module tutorial notebook and pytests.
KNNDataImputer:
Description: This method imputes missing data of a feature using k-nearest neighbours. A feature's missing values are imputed using the mean value from k-nearest neighbors in the dataset. Two samples are close if the features that neither is missing are close. This subclass uses the sklearn.impute.KNNImputer class in the background.
Includes the capability of imputing categorical data by using encoding/decoding through the enable_encoder boolean param.
Added components: main class, doc-string/readthedocs documentation, module tutorial notebook and pytests.
Updated all 3 imputers to automatically detect columns to impute (when col_impute=None) not only at fit time but also at transform time, as the dataset used at either instances could be different.
Other minor code changes to accommodate these additions.
This PR includes the following additions of the maabdin/imputation branch:
2 new imputation methods:
IterativeDataImputer
andKNNDataImputer
(in addition to the existing BasicImputer pre-implemented).IterativeDataImputer
:sklearn.impute.IterativeImputer
class in the background.enable_encoder
boolean param.KNNDataImputer
:sklearn.impute.KNNImputer
class in the background.enable_encoder
boolean param.Updated all 3 imputers to automatically detect columns to impute (when
col_impute=None
) not only at fit time but also at transform time, as the dataset used at either instances could be different.Other minor code changes to accommodate these additions.