Your trained regression model has heteroskedasticity problem? Your model can't predict the extreme values very well? Consider using these pre-processing technics for solving these issues probably caused by your imbalanced dataset.
## Pypi version
pip install PyImbalReg
## GitHub version
pip install git+https://github.com/vd1371/PyImbalReg.git
# importing PyImbalReg
import PyImbalReg as pir
# importing the data
from seaborn import load_dataset
data = load_dataset('dots')
ro = pir.RandomOversampling(df = data, # Passing the data
rel_func = 'default', # Default relevance function will be used
threshold = 0.7, # Set the threshold
o_percentage = 5 # ( o_percentage - 1 ) x n_rare_samples will be added
)
new_data = ro.get()
Please share your issues, new technics and your contributions with us. Your help is much appreciated in advance.
Branco, P., Torgo, L. and Ribeiro, R.P., 2019. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, pp.76-99.
© Vahid Asghari, 2020. Licensed under the General Public License v3.0 (GPLv3).
P.S. Some parts of the readme and codes were inspired from https://github.com/nickkunz/smogn