nickkunz / smogn

Synthetic Minority Over-Sampling Technique for Regression
https://pypi.org/project/smogn
GNU General Public License v3.0
312 stars 78 forks source link

Defining sampling strategy #5

Open shaddyab opened 4 years ago

shaddyab commented 4 years ago

Is it possible to use the algorithm to apply upsampling without any downsampling. For example, if I have a dataset with the following distribution of the target feature: 500 Negative Samples 200 Positive Samples 1000 ==0 Samples

Can I set the algorithm to only upsample the number of positive values without affecting the number of negative and equal to zero samples. For example, the output will be

500 Negative Samples 500 Positive Samples 1000 ==0 Samples

I know that in the imblearn.over_sampling.SMOTENC function it is possible to set the 'sampling_strategy' argument to a dictionary where the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class.

nickkunz commented 4 years ago

Hello @shaddyab. It is possible to over-sample without any under-sampling by: under_samp = False. However, over / under-sampling based on non-negativity is currently not supported. The over / under-sampling strategy is predicated on the φ or phi function, which does not discretely discriminate in this regard.