Open Sehjbir opened 6 months ago
Description:
I have a dataset which contains both numeric and categorical variables. I want to combine oversampling and under-sampling together. SMOTEOMEK is only applicable to pure numeric dataset.
Code Snippet:
model_oversampler_smotenc = make_pipeline( SMOTENC(random_state=44, categorical_features= category_cols), TomekLinks(sampling_strategy='auto'), GradientBoostingClassifier()) scoring=['balanced_accuracy', 'f1', 'precision', 'recall'] cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=10, random_state=3) cv_results_oversampler_smotenc = cross_validate( model_oversampler_smotenc, data_train , target_train, scoring=scoring, return_train_score=True, return_estimator=True, cv=cv, n_jobs=-1) print( f"Balanced accuracy mean +/- std. dev.: " f"{cv_results_oversampler_smotenc['test_balanced_accuracy'].mean():.3f} +/- " f"{cv_results_oversampler_smotenc['test_balanced_accuracy'].std():.3f}"
Questions:
alternative?
Description:
I have a dataset which contains both numeric and categorical variables. I want to combine oversampling and under-sampling together. SMOTEOMEK is only applicable to pure numeric dataset.
Code Snippet:
Questions:
alternative?