scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.28k forks source link

Sample ordering after RandomUnderSampler #753

Closed alexHeu closed 4 years ago

alexHeu commented 4 years ago

Hi,

I have a question regarding the resulting sample ordering after using the RandomUnderSampler.

Currently, I run the sampling as follows: X_resampled, y_resampled = sampler.fit_resample(X_train, y_train)

However, when looking at the result, I saw that all positive samples are placed after the negative ones: plt.plot(y_resampled): image

Is this expected behavior? Intuitively, it would make sense to me to keep the original ordering intact.

Would the following way be appropriate to achieve this? sample_indices = np.sort(sampler.sample_indices_) X_resampled = X_train[sample_indices] y_resampled = y_train[sample_indices]

best regards Alex

hayesall commented 4 years ago

Would the following way be appropriate to achieve this?

I think your sample_indices trick could keep the ordering for undersampling.

Is this expected behavior? Intuitively, it would make sense to me to keep the original ordering intact.

Therefore it's unlikely to provide a fixed ordering in general.

Do you have a use-case that strongly requires a fixed ordering?

glemaitre commented 4 years ago

I am closing since this is a usage question