Seeking reference for the shrinkage parameter in `RandomOversampler`

scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

https://imbalanced-learn.org

MIT License

6.85k stars 1.29k forks source link

Seeking reference for the shrinkage parameter in `RandomOversampler` #1030

Open EssamWisam opened 1 year ago

EssamWisam commented 1 year ago

I was trying to implement ROSE (Random Oversampling Examples) in Julia, and after considering the paper I decided to look at imbalanced-learn's implementation. There seems to be a shrinkage parameter that has no mention in the paper. I see that its multiplied by the smoothing matrix used for the kernel as in here near line 214 and understand its effect; however, why the specific name "shrinkage"? Especially when it causes the points to spread farther apart...

glemaitre commented 1 year ago

I think I found that this parameter was behaving the same way as the learning_rate in GBDT, also known as shrinkage. You also have the concept of shrunk covariance that based on the value of the shrinkage your "shrink" the empirical covariance to an identity matrix.

Here, you use the shrinkage to "shrink" towards the simple random oversampling. Then, it is true that it might be counter-intuitive that the value of shrinkage is actually the opposite of what you would expect :).

glemaitre commented 1 year ago

And the reason why this parameter is not in the paper is just because this single parameter can switch from the normal random over-sampling to ROSE without the need to create a dedicated class.