Closed U-n-Own closed 2 years ago
This question is a little too broad at the moment, we mainly focus on extensions and issues for imblearn
here.
More-general Q&A forums for machine learning topics (e.g. https://stats.stackexchange.com/) might be a better fit for this.
Follow-up in the future if there's a good way to approach this. #105 is also where notes on new methods are currently tracked.
<-- If you want to propose a new algorithm, please refer first to the scikit-learn inclusion criterion: https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms -->
Is your feature request related to a problem? Please describe
Describe the solution you'd like
Non-IID data are data that sometime can be found when training models on distributed devices, these are unbalanced wrt the devices and have different distribution of labels as well. For example in Federated Learning there are plenty of those.
I want to propose an algorithm that takes some data and distributes them in an non-IID fashioned way, i had to do it for an experiment but i didn't find any general algorithm that do this, so I'm proposing to create one, don't know if here is the right place.
Describe alternatives you've considered
Ideally, we take the data and the labels, then we can distribute our data in two ways or a mix of the two: unbalancing the data on each sub-distribution or unbalancing around the labels on each sub-distribution.
Additional context