scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

Imbalanced-learn 1.X #645

Open glemaitre opened 5 years ago

glemaitre commented 5 years ago

While imbalanced-learn 0.X really focuses on samplers, over time we start to add additional methods like ensemble classifiers. We could think about releasing imbalanced-learn 1.X which could reorganize the methods. We could think about adding cost-sensitive learning method, for instance. One way could be:

In this case, we would probably import thing with an additional layer:

from imblearn.predictors.ensemble import BalancedRandomForest
from imblearn.samplers.under_sampling import RandomUnderSampler

@chkoar Could you add any thought in this thread.

chkoar commented 5 years ago

I agree with that hierarchy. Since, the literature distinguish the methods mostly in data level approaches and algorithm level approaches samplers and predictors make totally sense. There are also methods that tackle the problem modifying the feature space. We could add those in the preprocessing module when we have such an implementation.

I believe that we should always import from the second level like this

from imblearn.predictors import BalancedRandomForest
from imblearn.samplers import RandomUnderSampler

An option could be to get rid different base classes and rely to estimators tags. That might give as freedom to make changes more efficiently.