Closed mb706 closed 1 month ago
themis
implements differrent methods for handling unbalanced data. This is a summary based on the manual:
adasyn
(Adaptive Synthetic Algorithm) which generates synthetic positive instances; works for tasks with only numeric features that have no NA
s.bsmote
(Borderline SMOTE) which generates new examples of the minority class using nearest neighbors of these
cases in the border region between classes; works for tasks with only numeric features that have no NA
s.nearmiss
which eliminates entries from the majority class that have the smallest distance to the minority class; works for tasks with only numeric features that have no NA
s.smote
, see PipeOpSmote
smotenc
, currently under development as PipeOpSmoteNC
tomek
which removes observations that are part of tomek links; works for tasks with only numeric features that have no NA
s.Furthermore, the package implements a recipe for one external function:
ROSE::ROSE()
which creates a sample of synthetic data by enlarging the features space of minority and majority class
examples; for inbalanced data in binary classification tasks.Lastly, the package implements a recipe (step_upsampling
) without a separate implementation, which will replicate rows of a data set to make the occurrence of levels in a specific factor level equal. Only intended for training. Implemented as PipeOpClassBalancing
.
Of course, implementing these in https://github.com/mlr-org/mlr3pipelines/issues/490 refers to interoperabilitiy in the sense that a pipeline could be used as a step in a mlr3pipelines
wouldn't be necessary if pipelines was generally interoperable with tidymodels
(https://github.com/mlr-org/mlr3pipelines/issues/490).tidymodels
recipe.
Of these, the following are also implemented in smotefamily
:
adasyn
blsmote
(Borderline-Smote)and additionally:
ANS
(Adaptive Neighbor SMOTE)DBSMOTE
(Density-based SMOTE)RSLS
(Relocating Safe-Level SMOTE)SLS
(Safe-level SMOTE)We have the themis content itself already, so I will close this for now.
what other methods exist in the themis package that we could use?