tidyverse / modelr

Helper functions for modelling
https://modelr.tidyverse.org
GNU General Public License v3.0
401 stars 66 forks source link

Sampling methods to address class imbalance #27

Closed dtelad11 closed 7 years ago

dtelad11 commented 7 years ago

How about adding sampling methods that address class imbalance? The simplest idea would be to add a function to downsample all classes to the number of elements of the smallest class or oversampling all classes to the number of elements of the biggest class. Another option is some sort of weighted bootstrap, where probability of picking an element from a class is proportional to the class size -- that would probably need to include some sort of weighing parameter and depends on the sampling method.

If you think this is relevant I could fire a pull request with some ideas.

jrnold commented 7 years ago

@dtelad11 If you're still interested in this, I've implemented most all of these in https://github.com/jrnold/resamplr.

dtelad11 commented 7 years ago

@jrnold Thank you! I'll definitely use it next time I need smarter resampling.