ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
51 stars 9 forks source link

feat: preprocessing transformation priorities #32

Open jitingxu1 opened 3 months ago

jitingxu1 commented 3 months ago

Building upon the deliverables outlined in issue #19, the objective is to enhance the coverage of ibisml machine learning preprocessing transformations, prioritizing key areas for improvement.

Please share your favorite ML transformation for your daily ML tasks and provide additional context as to why you find it particularly useful.

Assumption

Priority definition:

Priorities

Preprocessing Module Ibis-ml Step sklearn Priority Status Note Model Needed
Encoding CatgoricalEncode OrdinalEncoder P0 Done
Encoding CountEncode P1 Done
Feature Engineering CreatePolynomialFeatures PolynomialFeatures P0 Done
Non-linear Transformation Math Transformation (log, sqrt,) P1 Done ibis
Standardization and Scaling ScaleStandard StandardScaler P0 Done KNN, MLPBased, SVM
Encoding TargetEncode TargetEncoder P0 Done
Feature Reduction DropZeroVariance VarianceThreshold P0 Done
Imputing HandleUnivariateOutliers SimpleImputer P0 Done
Feature Engineering ratio variable creation P0 Done ibis
Discretition DiscretizeKBins KBinsDiscretizer P0 Done
Discretition Feature binarization Binarizer P1 Done
Standardization and Scaling ScaleMinMax MinMaxScaler P0 Done KNN, MLPBased, SVM
Custom Transformer Custom transform FunctionTransformer P0 Done
Encoding OneHotEncode OneHotEncoder P0 Done
Imputing Outlier - Impute and capping P0 Done Log/Linear Reg
Feature Reduction Continuous Target Mutual Info P1 Not started
Feature Reduction Discrete Target Mutual information P1 Not started
Feature Engineering - Text Count Transfomer CountVectorizer P2 Not started
Feature Engineering - Text TFIDF Transformer TfidfTransformer P2 Not started
Encoding label binarizer LabelBinarizer P2 Not started
Encoding label encode LabelEncoder P2 Not started
Standardization and Scaling MaxAbsScaler MaxAbsScaler P2 Not started
Standardization and Scaling RobustScaler RobustScaler P1 Not started KNN, MLPBased, SVM
Imputing Missing value - Nearest Neighbor KNNImputer P1 Not started Doable
Non-linear Transformation QuantileTransformer QuantileTransformer P1 Not started
Non-linear Transformation Inverse and Logit transformation P2 Not started
Imputing Missing value - Linear reg P1 Not started Not Support
Imputing Missing value - bagged trees P1 Not started Not Support
Feature Reduction Filter col with missing rate threshold P1 Not started
Feature Reduction Filter Feature by high correlation P2 Not started Doable
Non-linear Transformation PowerTransformer PowerTransformer P1 Not started MLPBased, SVM
Feature Reduction PCA P1 Not started Not Support
Imputing Missing Value - rolling window Imputing P2 Not started
Feature Engineering Spline transformer SplineTransformer P1 Not started

Reference:

zhenzhongxu commented 2 weeks ago

@jitingxu1 @deepyaman can we ensure this is up to date? Thank you.

deepyaman commented 2 weeks ago

@jitingxu1 @deepyaman can we ensure this is up to date? Thank you.

@zhenzhongxu This is up-to-date already!