nickkunz / smogn

Synthetic Minority Over-Sampling Technique for Regression
https://pypi.org/project/smogn
GNU General Public License v3.0
319 stars 78 forks source link

How to specify resampling range? #26

Open naeemmrz opened 2 years ago

naeemmrz commented 2 years ago

Hello,

I'm trying to use SMOGN on my dataset, the default parameters are good to some extent but I was wondering if I could specify the range I want to oversample or undersample? For example, my Y variables are between 3-8, and there really only a few data points for numbers between 7-8, how can I oversample only the data points between 7-8?

The advanced example have something like this mentioned

## specify phi relevance values
rg_mtrx = [

    [35000,  1, 0],  ## over-sample ("minority")
    [125000, 0, 0],  ## under-sample ("majority")
    [200000, 0, 0],  ## under-sample
    [250000, 0, 0],  ## under-sample
]

But I couldn't make sense of these values, in [35000, 1, and 0], what are the 1 and 0 for? what do they represent? It says somewhere that it's a 2d array (format: [x, y]), which xy is it? and why are there 3 values if it's only x and y?

Thanks in advance for any help :)

nickkunz commented 2 years ago

@naeemmrz please take a closer look at the functions contained here: https://github.com/nickkunz/smogn/blob/master/smogn/phi_ctrl_pts.py.

I hope this addresses your question. If not, please let me know and I will do my best to help answer for you and others.

If there are others with this question, please comment. Thank you.

naeemmrz commented 2 years ago

@nickkunz sorry for the late reply. I did check those functions before posting here, I couldn't derive a conclusion from them (I'm still an intermediate Python user :D) If you couldn't spare the time to explain it, it would be much helpful.