Closed rgtzths closed 1 year ago
I would not consider as a bug since you have an index if you pass a pandas dataframe or as well the information via the attribute sample_indices_
. Making a sorting will be an extra-costly step that is not useful for everyone.
So I would let the user sorting once the sampling is done.
It would not be that costly, as it can be solved by sorting the selected indexes before sampling. Instead of sorting the sampled dataset (which is more expensive).
The idea was to enable sorting the index before sampling, which could be solved with the following code.
self.sample_indices_ = sorted(idx_under)
instead of what we have now
self.sample_indices_ = idx_under
However, if you still consider it too expensive to perform, I will sort it after the sampling is performed.
Describe the bug
The random undersampler does not keep the original order of the data. This is troublesome when the data is desired to keep the original order as much as possible.
Steps/Code to Reproduce
Expected Results
(array([[1, 1, 1]], [2, 2, 2], [3, 3, 3], ), array([1, 0, 0]))
Actual Results
(array([[2, 2, 2], [3, 3, 3], [1, 1, 1]]), array([0, 0, 1]))
Versions
numpy=1.23.5 imblearn=0.0