scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.8k stars 1.28k forks source link

[BUG] SMOTE returns numpy array when resampling Pandas DataFrame #734

Closed InterferencePattern closed 4 years ago

InterferencePattern commented 4 years ago

Describe the bug

When a SMOTE object resampling a Pandas DataFrame, the returned object from the .fit_resample() method should also be a Pandas DataFrame. Instead it is returned as a numpy array.

Steps/Code to Reproduce

from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X_train, y_train)

Expected Results

Output would be X_resampled, y_resampled in the form of Pandas DataFrame, Pandas Series

Actual Results

Output is X_resampled, y_resampled in the form of numpy array, numpy array

Versions

Unfortunately I cannot give details about the playform, but the following are the important packages: Python 3.5.3 NumPy 1.15.2 SciPy 1.1.0 Scikit-Learn 0.20.0 Imbalanced-Learn 0.4.3

glemaitre commented 4 years ago

The functionality to return a DataFrame has been added in imbalanced-learn 0.6 if I recall correctly. We have unit test so I am sure it works in the latest release. You will need to install scikit-learn 0.23 and imbalanced-learn 0.7. I think that both version do not support python 3.5

InterferencePattern commented 4 years ago

@glemaitre Thanks for the response, and sorry for not seeing this sooner in the Release Notes.