This implements Synthetic Minority Over-sampling Technique for Nominal and Continuous Data (SMOTENC) using themis::smotenc().
themis::smotenc accepts twoclass or multiclass targets and factor, ordered, numeric and integer features (contrary to the name "nominal and continious"), which is why we do too. NAs in any of the feature columns are not permitted.
Integer features are handled as if they were numeric by themis::smotenc. However, since we don't want to change the feature type, we round the generated data points back to the nearest integer. This implies that our pipeop does not lead to the same results as one would get by just using themis::smotenc.
For unsupported columns, this has the same implementation as PipeOpSmote in https://github.com/mlr-org/mlr3pipelines/pull/815. It should be checked first whether that implementation is OK, so it could be adjusted here as well.
This implements Synthetic Minority Over-sampling Technique for Nominal and Continuous Data (SMOTENC) using
themis::smotenc()
.themis::smotenc
accepts twoclass or multiclass targets and factor, ordered, numeric and integer features (contrary to the name "nominal and continious"), which is why we do too.NA
s in any of the feature columns are not permitted.Integer features are handled as if they were numeric by
themis::smotenc
. However, since we don't want to change the feature type, we round the generated data points back to the nearest integer. This implies that our pipeop does not lead to the same results as one would get by just usingthemis::smotenc
.For unsupported columns, this has the same implementation as
PipeOpSmote
in https://github.com/mlr-org/mlr3pipelines/pull/815. It should be checked first whether that implementation is OK, so it could be adjusted here as well.closes https://github.com/mlr-org/mlr3pipelines/issues/784