vd1371 / PyImbalReg

Pre-processing technics for imbalanced datasets in regression modelling
GNU General Public License v3.0
10 stars 0 forks source link

PyImbalReg

Pre-processing technics for imbalanced datasets in regression modelling

PyPI version License: GPL v3 Codacy Badge GitHub last commit


Dealing with imbalanced datasets for regression modelling

Your trained regression model has heteroskedasticity problem? Your model can't predict the extreme values very well? Consider using these pre-processing technics for solving these issues probably caused by your imbalanced dataset.


Installation

## Pypi version
pip install PyImbalReg

## GitHub version
pip install git+https://github.com/vd1371/PyImbalReg.git

Example

# importing PyImbalReg
import PyImbalReg as pir

# importing the data
from seaborn import load_dataset
data = load_dataset('dots')

ro = pir.RandomOversampling(df = data,              # Passing the data
            rel_func = 'default',  # Default relevance function will be used
            threshold = 0.7,       # Set the threshold
            o_percentage = 5       # ( o_percentage - 1 ) x n_rare_samples will be added 
            )
new_data = ro.get()

Requirements

  1. SciPy
  2. Pandas
  3. Numpy

Other examples

  1. RandomOversampling
  2. GaussianNoise
  3. WERCS

    Contributions

Please share your issues, new technics and your contributions with us. Your help is much appreciated in advance.


If you are using this repository

Please cite the below reference(s)

Branco, P., Torgo, L. and Ribeiro, R.P., 2019. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, pp.76-99.


License

© Vahid Asghari, 2020. Licensed under the General Public License v3.0 (GPLv3).

P.S. Some parts of the readme and codes were inspired from https://github.com/nickkunz/smogn